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Local and Global Inference for High Dimensional 
Nonparanormal Graphical Models 


Quanquan Gu* Yuan Cao^ Yang Ning^ Han Liu^ 


Abstract 

This paper proposes a unified framework to quantify local and global inferential 
uncertainty for high dimensional nonparanormal graphical models. In particular, we 
consider the problems of testing the presence of a single edge and constructing a uniform 
confidence subgraph. Due to the presence of unknown marginal transformations, we 
propose a pseudo likelihood based inferential approach. In sharp contrast to the existing 
high dimensional score test method, our method is free of tuning parameters given an initial 
estimator, and extends the scope of the existing likelihood based inferential framework. 
Furthermore, we propose a U-statistic multiplier bootstrap method to construct the 
confidence subgraph. We show that the constructed subgraph is contained in the true 
graph with probability greater than a given nominal level. Compared with existing 
methods for constructing confidence subgraphs, our method does not rely on Gaussian or 
sub-Gaussian assumptions. The theoretical properties of the proposed inferential methods 
are verified by thorough numerical experiments and real data analysis. 

Keyword: Pseudo likelihood. Nonparanormal Graphical Models, Gaussian Gopula Graphical Models, 
Sparsity, Hypothesis Test, Gonfidence Interval, High-dimensional Inference 


1 Introduction 

Graphical models (Lauritzen, 1996) have been widely used to explore the dependence structure 
of multivariate distributions. In the Gaussian graphical model, a d-dimensional random vector 
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X = (Xi,... ,Xd)'^ E follows a multivariate normal distribution N{0, H). It corresponds 
to an undirected graph G = {V, E), where V contains nodes corresponding to the d variables in 
X, and the edge set E describes the conditional independence relationships among Xi,..., X^- 
It is well-known that the graph G is encoded by the sparsity pattern of the precision matrix 
0 = 1]“^. More specihcally, no edge connects Xj and Xk if and only if Qjk = 0. Therefore, the 
graph estimation problem can be reduced to the estimation of the precision matrix 0, based 
on n independent observations sampled from N{0, X)). Such a problem is also known 

as the covariance selection (Dempster, 1972). A large body of literature (Meinshausen and 
Biihlmann, 2006; Yuan and Lin, 2007; Banerjee et ah, 2008; Friedman et ah, 2008; Rothman 
et ah, 2008; Lam and Fan, 2009; Peng et ah, 2009; Yuan, 2010; Cai et ah, 2011; Ravikumar 
et ah, 2011; Shen et ah, 2012; Jalali et ah, 2012; Sun and Zhang, 2012b; Zhu et ah, 2013, 
2014; Yang et ah, 2015) has studied the estimation problem for the precision matrix 0 under 
different assumptions in the high dimensional setting, where the number of parameters is 
much larger than the sample size, i.e., n. 

Although the Gaussian graphical model has many appealing theoretical properties, the nor¬ 
mality assumption is restrictive. The inferred graph can be misleading if the data distribution 
is not Gaussian. To relax the Gaussian distribution assumption, Liu et ah (2009, 2012); Xue 
and Zou (2012) extended the Gaussian graphical model to the more flexible nonparanormal 
graphical model, which is also known as Gaussian copula models (Klaassen and Wellner, 
1997). A random vector X is said to belong to a nonparanormal family if there exists a set of 
univariate monotonic functions {fj}j^i such that f{X) = [fi{Xi), ..., fdiXd)]^ ~ N{0, S). 
While the parameter estimation and graph recovery are studied by Liu et ah (2012); Xue and 
Zou (2012), how to quantify the uncertainty of the estimation remains largely unknown. 

This paper proposes a unified framework for local and global inference on high dimensional 
nonparanormal graphical models. In particular, we consider two types of inferential problems: 
(1) testing the presence of a single edge, i.e., Hq : Qjk = 0, and (2) constructing a confidence 
subgraph G satisfying P(G C G*) > 1 — a asymptotically, where G* is the true graph, and 
a E (0,1) is a given significance level. The first inferential problem arises when testing the 
independence between two variables Xj and is of interest. In addition, the confidence 
subgraph G in the second inferential problem characterizes the dependence structure among 
all variables with a given confidence level. Compared to the recent work on high-dimensional 
inference (Belloni et ah, 2012; Biihlmann, 2013; Zhang and Zhang, 2014; Javanmard and 
Montanari, 2013; van de Geer et ah, 2013; Lockhart et ah, 2014; Taylor et ah, 2014; Ning and 
Liu, 2014), our work has the following novel contributions. 

First, to eliminate the unknown nuisance functions {fj}'j^i, we propose a pseudo likelihood 
approach. However, in order to construct a score test, existing high dimensional score test 
method (Ning and Liu, 2014) cannot be applied, because it needs to solve a Dantzig selector, 
which turns out to be a problem of size x for nonparanormal graphical models and is 
computationally prohibitive. In contrast, our proposed pseudo likelihood approach does not 
require such a procedure. Thus, given an initial estimator, our method is free of the tuning 
parameters. In particular, we establish the asymptotic guarantees on the type I error as well 
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as the local power of the proposed pseudo score test. Our pseudo likelihood method and 
theory addresses the unique challenge in the inference for high dimensional nonparanormal 
graphical models. It extends the scope of the existing likelihood based inferential framework 
and is of independent interest. 

Second, we propose a novel U-statistic multiplier bootstrap method to construct the confi¬ 
dence subgraph. The existing method and theory of multiplier bootstrap are mainly developed 
for approximating the distribution of sum of independent random variables (Chernozhukov 
et ah, 2013). However, due to the use of the pseudo likelihood, we show that the proposed 
test statistic is approximated by the maximum of high dimensional U-statistics. To address 
the challenges caused by the sum of nonindependent random variables, we develop a general 
method for bootstrapping U-statistics. In particular, we propose to apply the multiplier 
bootstrap to the leading terms of their Hoeffding decompositions. Based on the U-statistic 
bootstrap method, we can construct a confidence subgraph G and we prove that it satisfies 
P(G' C G*) > 1 — a, as n ^ oo. The proof of this result requires more refined analysis of 
the Hoeffding decomposition for high dimensional nonlinear transformations of U-statistics. 
This technique has its own theoretical interest. Compared with existing work targeting at 
constructing confidence subgraph for Gaussian graphical models (Drton and Perlman, 2007; 
Drton et ah, 2008; Wasserman et ah, 2013, 2014), our method does not rely on Gaussian or 
sub-Gaussian assumptions. 

1.1 Further Comparison with Related Work 

There are several recent works for asymptotic inference in the context of Gaussian graphical 
models. For example, Jankova and van de Geer (2013) extended the de-sparsified method 
(van de Geer et ah, 2013) to the Gaussian graphical model. They required the irrepresentable 
condition (Ravikumar et ah, 2011) and the assumption s^(log(i)/n = o(l), where s = 
maxi<j<rf ^^^1 l(0jfc 7 ^ 0). In contrast, we do not need the irrepresentable assumption 
and we improve their results in the sense that a weaker assumption s^logd/n = o(l) is 
sufficient for our local inference. In addition, Ren et ah (2013); Chen et ah (2015) proposed 
a scaled Lasso (Sun and Zhang, 2012a) based approach to test the presence of an edge in 
the Gaussian graphical model and covariate-adjusted Gaussian graphical model, and Liu 
(2013) further developed a new procedure to control the false discovery rate. Nevertheless, 
their inference method cannot be extended to nonparanormal graphical models, because 
they directly manipulate the residuals in the node-wise regression, which is not available in 
nonparanormal graphical models. We note that, while a larger family of graphical models 
is considered, our assumption s^logd/n = o(l) achieves the best possible scaling for the 
Gaussian graphical model (Ren et ah, 2013; Liu, 2013). 

1.2 Organization of the Paper 

The remainder of this paper is organized as follows. We briefly review the nonparanormal 
graphical model in Section 2. In Section 3, we propose a new pseudo score test for local 
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graph inference, and a global graph inferential procedure for constructing subgraph conhdence 
intervals. In Section 4, we establish the asymptotic properties of the hypothesis testing 
procedures proposed in Section 3. Section 5 shows the numerical results on both synthetic 
and real-world datasets. Section 6 concludes this work with discussions. 

1.3 Notation 

We summarize the notation to be used throughout the paper. Let [d] = {1, 2,..., d}. Let 
A = [Aij] £ be a d X d matrix and x = [xi,..., Xij}^ £ be a d-dimensional vector. 

vec(A) is the vectorization of A. For 0 < g < oo, we define the t'o, £q and ioo vector norms as 

d y d \ 1 

||x||o = 7 ^ 0), ||x||g = ( Ixil'') , and ||x||oo = max |xi|, 

i=\ ^i=\ ^ — 

where l(-) represents the indicator function. We use the following notation for the matrix 
I’max and Ip norms: 

d 

||A||g = max ||Ax||g, ||A||max = max \Aij\, ||A||i 4 = |Ay|, ||A||f 

||x||=l ^^ 

*J=1 

For two matrices A and B, we use A (8) B to denote the Kronecker product, and A © B to 
denote the Hadamard (elementwise) product. For a matrix 0 and an index set S' C [d] x [d], 
©5 denotes the set of numbers [Q(jk)]{jk)eS- L^t <I>(-) denote the cumulative distribution 
function (CDF) of standard normal distribution, and 4>“^(-) denote its inverse function. For 
a sequence of random variables A„, we write —)• A if converges in probability to A, 

and Xn A if A„ converges in distribution to A. 



2 Nonparanormal Graphical Models 


In this section, we briefly review nonparanormal graphical models (Liu et ah, 2009), which is a 
semiparametric extension of the Gaussian graphical model. In particular, a random vector A = 
(Ai, ...,Xd)^ follows nonparanormal distribution, A ~ NPN{¥, 51), if and only if there exists 
a set of monotonic transformations f = such that f(A) = [fi{Xi),fd{Xd)]~^ ~ 

A(0, 51) with diag(Xl) = I. Given n independent observations Ai,...,A„, where Aj = 
(Aji, ...,Xid)~^ ~ APA(f, H), Liu et al. (2012); Xue and Zou (2012) proposed a rank-based 
estimator, such as Spearman’s rho or Kendall’s tau, to estimate S, due to their invariance 
under monotonic marginal transformations. For example, the Kendall’s tau estimator is 
defined as 


Tjk 


n(n-l) ^ sign [(Ajj Ajq) (Aj^ Aj/fc)]. 


( 2 . 1 ) 
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Liu et al. (2012) showed that Tjk is an unbiased estimator of Tjk = 2/7r arcsin(Sjfc), and the 
correlation matrix X) can be estimated by S = where 



Once an estimate of X) is obtained, the existing procedures for the Gaussian graphical model 
can be used to estimate 0 = which encodes the conditional independence structure 

among X. 

3 Inference for Nonparanormal Graphical Models 

Under the nonparanormal graphical models, Xj and X^ are independent given the remaining 
variables if and only if 0*^ = 0. In this section, we first propose a pseudo score test for 
~ O’ then consider how to construct the confidence subgraph. 

3.1 Local Graph Inference: A Pseudo Score Approach 

Without loss of generality, assume that 0 can be partitioned as 0 = {Qjk,®(jky), where 
Qjk G M is the parameter of interest and G is the nuisance parameter. We are 

interested in testing Hq : = 0 versus Hi : 0*^ ^ 0. In what follows, we consider a pseudo 

score test approach. 

Liu et al. (2009) showed that the log-likelihood function of the nonparanormal graphical 
model contains unknown nuisance functions |/i,...,/rf|, and the plug-in estimation for 
|/i,...,/d} results in a sub-optimal estimator of 0. This makes the likelihood based 
inference in Ning and Liu (2014) infeasible. To overcome this problem, we define a pseudo 
log-likelihood function as follows 

^n(0) =-tr (S0)-k logdet(0), (3.1) 

where S is the estimated correlation matrix via Kendall’s tau correlation coefficients as in (2.1). 
The pseudo log-likelihood function in (3.1) is motivated by the log-likelihood function of 
Gaussian graphical model and is invariant to the marginal monotonic transformation functions 
{/l) • • •) fd}- 

Then we can easily calculate the derivatives as well as Hessians of the pseudo log-likelihood 
function in (3.1) with respect to Qjk and Q(jky as follows 

^{jk)^n{®) = —'^jk + [® ^]jk, VQ'fc)c£„(0) = vec(—-|- [0 

The above quantities are pivotal for the design and analysis of the hypotesis test. In addition, 
the U-statistic structure of T,jk makes our analysis more challenging than the Gaussian 
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graphical model analyzed by Jankova and van de Geer (2013). 

Given some initial estimator of 0, we define the pseudo score function for Qjk as 
(jk)^n{Qjk,®{jk)<=), where ®{jkY estimated nuisance parameter. The standard 

M-estimator theory relies on the asymptotic normality of the score type function, say 
VQk)in{®jk,®{jkY) ■ However, this property breaks down in the high dimensional setting, 
due to the regularization on the nuisance parameter estimator 0(jfc)c (Ning and Liu, 2014). 
In order to address the problem, we propose a decorrelated pseudo score function by removing 
the effect of the estimated high dimensional nuisance parameters More specifically, 

the decorrelated pseudo score function Sn^\@) is constructed as a linear combination of the 
pseudo score functions V£ni®), such that it achieves the robustness property with respect 
to the nuisance parameter, i.e., K{dSn^\&*)/d®(^jf.-^c) = 0. Simple algebra shows that the 
function satisfies these requirements is given by 


= -%k + [0-'],A: - (w5.,),vec(-S(,.,)c + [®-%kr)), (3.2) 

where q-;.) and H = 0*“^ (g) 0*“^ = S* (g) H*. Note that 

is the inverse of and it is a {(P — 1) x {d? — 1) matrix. The 

superscript {jk) of Sn^\@) in (3.2) indicates that it is the decorrelated score function for the 
component Qjk, and the subscript {jk) of indicates that it is the decorrelation vector 

corresponding to the score function Sn^\®). Note that the pseudo score function Sn^\Q) 
is a function of second-order U-statistics (Van der Vaart, 1998). To see this, we apply mean 
value theorem to expand the pseudo score function of Qjk as follows 


'^{jk)^n{®) 


Tjk{@) 


n{n — 1) 


E 




<n 


(3.3) 


where Tjk{®) is defined as 


Tjk{®) = cos < f arcsin ([0 -|- 


(1 -^)7r 
n{n — 1) 


[{Xij 



for some f G [0,1], and /il*^(0) is a kernel function given by 

/i*fc(0) = sign [{Xij - Xifj) (Xik - Xi>k)] + arcsin ([0"^]jfc). (3.4) 

Here, the superscript of hj{.{-) indicates that it is constructed using Xi and Xj/, and the 
subscript of indicates that it is associated with Qjk- Denote T = [Tjk] G and 

G = [Gjk] G where Tjk = Tjk{®*) and Gjk = 2f{n{n- 1)) We have 

V£n(0) = T © G. Since V(^jk)(.n{®) is the product of Tjk{&) and a second-order U-statistic, 
we can characterize its limiting distribution using the Hajeck projection (Van der Vaart, 1998). 
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In particular, we have 

2 ” 

-y^gjkiXi,e) (3.5) 

i=\ 

+ s ['the) -(®)1' 

where gjk{Xi,Q) = ^ and hj^'*(©) = E[/i**^(0)|Xi] is the projection of 

h’jk{&) onto the sigma algebra generated by Xi. Note that the hrst term on the right hand side 
of (3.5) is a sum of i.i.d. random variables, and the second term is asymptotically negligible. 
Therefore, 2/(re(n - 1)) X)i<i<i/<„ can be approximated by 2/nX)r=i 9 jk{Xi, 0). 

Denote G = [Gjk] G and F = [Fjk] G where Gjk = gjk{Xi,&*) 

and Fjk = cos { arcsin ([0*“^]^^)} = — [0*“^]^^ = — (S*^)^. We can show that 

Fjk = T)fc+op(l) for all (j, k) G [d] x [d]. The pseudo score function of 0, i.e., V£n(0*) = ToG, 
can be approximated by F 0 G. Moreover, since the pseudo score function Sn^\®*) is a 
linear combination of Vin{®*), i.e., Sn^\®*) = b|'J.^jvec(T O G), where h(^jk) = [1, ~''^*(Jk)V ■ 
In Section 4, we show that the variance of V£n{®*) can be approximated by R = E[vec(F O 
G®)vec(F 0 G*)"''], where G® = [G®^] is a matrix such that G®^ = gjk{Xi, 0*). Furthermore, 
we prove that the limiting distribution of ^/nSn^\®*) is ^"(0,4cj^), where = R(jfc) — 
2 Rq 7 .) can be consistently estimated by some estimator 

In addition, in the decorrelated score function in (3.2), is unknown. To make it 

a practical test statistic, we also need to estimate it. By the definition of a natural 

estimator of ’v^^k) is as follows: 

^(jfc)(0) (^'ii) 

where w is a function of 0. Since H is a x matrix, its inversion is computationally 
prohibitive. Fortunately, by the property of Kronecker product, we have H~^ = 0* 0 0* 
and = 000. In what follows, for notational simplicity, we drop out the super and 
sub script (jk) in Sn^\®), Sn^\®), '^*(^jk) ^-iicl ^(jk)j when they can be inferred 

from context. 

The following lemma shows that w* and w(0) can be efficiently calculated without 
performing matrix inversion. 

Lemma 3.1. Let ft = = 0* 0 0* and ft = = 000. We have 

(0 0 0 ){jkY,{jk) _ (0 ®){jkY,{jk) 

^®* ® ®*)(jk),{jk) ^{jk),Uk) (® ® ®)(jA:)-(iD 

(3.7) 

From Lemma 3.1, we can see that the estimator w(0) in (3.6) can be computed very 


^{jkY,{jk) 

^Uk),ijk) 


n{n — 1) 


E 




<n 
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efficiently from Q. This avoids the node-wise Lasso procedure for estimating a high dimensional 
matrix in van de Geer et al. (2013) or the Dantzig estimator for estimating a high dimensional 
vector in Ning and Liu (2014). Replacing in (3.2) with w(0), we can show that the 

decorrelated pseudo score function reduces to 


Sn (O, 0(j7jJc) 


ej 0^S©efc 
Qjj®kk 


(3.8) 


where 0 = (O, In Section 4, we prove the asymptotic normality of y/n ■ Sn{0, ©q-^^c) 

under Hq : 0*^ = 0. In particular, we show that its limiting distribution is the same as the 
limiting distribution of ^/n • 5^(0, 0^.^^^)) i-e., iV(0,4cr^). Hence, the score test statistic for 
Hq : = 0 is defined as 


STn = ^/nSn{0,&(jk)<^)/i2^), 


(3.9) 


where is the estimator of (which will be explained in details in Section 4). The score 
test with the significance level a is given by 


T5(a) = l(ST2 >xL), (3.10) 

where Xia i® the (1 — a)-th quantile of a xf random variable. The null hypothesis is rejected 
if and only if 4>5(a) = 1. The associated p-value of this test is Fs = 2(l — 4>(|ST„|)). 


3.2 Local Graph Inference: Confidence Interval 

Although the score test is convenient for hypothesis testing, it does not provide a confidence 
interval for the parameter of interest Ojk- In this subsection, we present a Wald test for 
nonparanormal graphical models, which can be equivalently used to construct the confidence 
interval for Ojk- 

The Wald test is also based on the decorrelated pseudo score function. More specifically, 
the first-order Taylor approximation of the pseudo score function Sn{Qjk, ®{jk)‘^) around Ojk 
gives rise to Sn{Qjk, ~ 'S'„(0) -F VQfc)S'„(0) {Qjk-Qjk)- This Taylor expansion yields 

an approximate root of the (approximately) unbiased estimating equation Sn{Qjk, ®{jk)‘^) = 0. 
We call it as the approximate pseudo-likelihood estimator. Since we have V(jfc)5„(©) = 
[ej&'Eek + ejT,&ek-l]/{@jj&kk) andS'„(©) = [ej0^S0efc - 0^^]/(0jj0fcfc), after some 
algebra, we show that the approximate pseudo-likelihood estimator is given by 


w ^ 

jk 


QjkGj&'^Gk + Qjkej'SQek — eJ&^I]0ek 

ej0Sefc -|- ej'S&Sk — 1 


(3.11) 


In Section 4, we prove that the limiting distribution of — &*jk) is F[{0, 4cj^ • H, 

where is the partial Fisher infor¬ 

mation. In order to construct the confidence interval, we need to estimate a and 




For (T, we use the same estimator used in the score test. For by the dehnition 

of w*, we use the following estimator i7Qfc)|Q7,)c = '^‘fjk),{jk)^ni&) - ^ 

which can be further simplihed as H(^jk)\(^jk)<: = ^/{&jjQkk)- Thus, the two-sided confidence 
interval for 0*^ with 1 — a coverage probability is given by 

[0]^ - n-V2$-i(i _ a/2) • Ljk, Qfk + - a/2) • L,k], (3.12) 

where Ljk = 2aQjjQkk- From the hypothesis test perspective, we can consider the following 
Wald test statistic for Ho : = 0: 

Wn = {2d)~^H^jk)\{jkYVn^Yk- (3-13) 

Then the Wald test with significance level a is given by 

^wia) = l{W^>xla)- (3.14) 

The null hypothesis is rejected if and only if 4>w(a) = 1- The associated p-value of this test is 
Pw = 2{l-mWn\)). 


3.3 Global Graph Inference: Confidence Subgraph 

In the previous subsections, we have discussed testing and confidence intervals for individual 
component of the precision matrix 0*, which is referred to as the local graph inference. Here, 
we consider how to construct a confidence subgraph with any specified confidence level, which 
is referred to as the global graph inference. 

More specifically, we aim at providing a confidence band for 0* with confidence level 
a, which can be used to construct a confidence subgraph. The main idea is that, based 
on the approximate pseudo-likelihood estimator 0j^ in (3.11), if we are able to obtain the 
(1 - a)-quantile of max(j_fc)g[rf]x[d] then we have 


max 

y/n 

P)W _ p)* 
^jk '^jk 

< Ca 

. {hk) 


Ljk 



©S- 


k 



< e*k < &Tk + 


CciLjk 



V(j, k) G [d] X [d] , 


where Ljk = 2a@jj@kk- Based on the above inequality, we can construct a confidence 
subgraph. In details, starting from a complete graph, for each edge (j, fc), if 0 G [0^ — 
CaLjk/y/n, Q^k + CaLjk/yf^ , then this edge is removed. We denote the resulting graph as G, 
which is a (1 — a)-confidence subgraph. 

As we have seen, the key to construct the confidence subgraph is to characterize the 
limiting distribution of maxQ-\Ai|(©]fc ~ ^*jk)/^jk\- More specifically, we define the 
following pivotal quantity 


T = max 
{j,k)^[d]x[d] 



^{jk)\{jkY / ( 20 "). 


(3.15) 
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Define its (1 — Q;)-th quantile as C'r(l — a) = infjt E M : P(T < t) > a|. Since we can show 
that 


..max Vra|(0]I-0*fc) ■ -'^Jjk)^ec{FQG)/{2a)\ = op(l), (3.16) 

where b||^.^jvec(F 0 G) is the Hajek projection of Sn^\&*), this motivates us to construct a 
Gaussian multiplier bootstrap estimator of T as follows 


W = max 

(j,A;)e[d]x[(i] 



(3.17) 


where Zijk = -l/(2a) • b||^.^^vec(F O G*) with b(jfc) = [1 ,, G* = [G*^] G such 

that = l/(n - 1) XliY* and F = [Fjk] G such that Fjk = y^l - and 

{ei}2^i is a sequence of i.i.d. standard normal random variables that are independent of 
Note that although we cannot derive the quantile of W analytically, we can compute 
it using Monte-Carlo method as in standard bootstrap methods. We prove in Section 4 
that the quantile of W converges to the quantile of T uniformly. To justify the theory of 
the resulting Gaussian multiplier bootstrap, we need more careful analysis of the uniform 
approximation error in (3.16) due to the Hajek projection. Given the validity of the resulting 
Gaussian multiplier bootstrap, we can obtain a uniform confidence interval of each 0*^ as 

■ cvi/(l - a/2) • 0]^ • cw{l - al‘2) ■ Ljk\ , 

for all (j, k) G [d] x [d], where cu/(l — a/2) is the (1 — a/2)-quantile of W, Ljk = 2aQjjQkk, 
from which, we can immediately construct the conhdence subgraph. 


4 Theory of Local and Global Inference 

In this section, we establish the main theoretical results for the local and global inferential 
procedures proposed in Section 3. 

4.1 Asymptotic Distribution of Score Test Under the Null Hypothesis 

We consider the following precision matrix class with some M > 0 and s > 0, 

U{s, M) = /© : © 0 0, ||©||i < M, max ^{Qjk / 0) < si, (4.1) 

I J 

where © 0 0 indicates that © is a symmetric and positive definite matrix. Note that 
maxi<j<rf^^^i t{Gjk / 0) < s constrains the maximum degree of the graph associated with 
the sparse precision matrix to be no more than s. 

As shown in Section 3, the test statistic ST^ depends on the pseudo empirical score 
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function Sn{&) defined in (3.8). To characterize the limiting distribution of Sn{®), we first 
derive the limiting distribution of Sn{®*) and then show that Sn{®) converges to Sn{®*) 
uniformly. 

Since Sn{@*) is a linear combination of V£n(0*), it is pivotal to analyze the limiting 
behavior of V£n(®*)- For nonparanormal graphical models, one challenge is that its pseudo 
score function V£n(®*) is a function of U-statistic. To establish its asymptotic normality, as we 
show in Section 3, we use the Hajek projection technique (Van der Vaart, 1998) to approximate 
this U-statistic by a sum of i.i.d. random variables, i.e., V£n(®*) = ToGRiFoG. Recall 
that T = [Tjk] G with Tjk = Tjk{®*), F = [Fjk] G such that Fjk = - [®*~^‘jf., 

G = G such that Gjk = 2/{nin - 1)) Ei<i<i'<nh%{®), and G = [Gjk] G 
such that Gjk = gjk{Xi,®*). Note that G is composed of a sum of independent 

random variables, and Fj^ ■ ^jn ■ gjk{Xi, ®*) is the projection of V£n{®*) onto the cr-filed 
generated by Xj. We show that F © G is a good approximation of V£n i®*). 

FurtliGrmorG, it is Gcisy to show thcit ®*)] ~ Thus, wo hcivo ]E[F O G] = 0. 

Let G® = [G®^] be a matrix such that = gjk{Xi, 0*). Then the variance of V£n{®*) can 
be approximated by 


R = E[vec(F0G®)vec(F0G®)^], (4.2) 

because R is the second moment of the Hajek projection of V£n{®*), which is a proxy to the 
variance of \££n{®*)- Before presenting the main theoretical results, we first make several 
mild regularity assumptions, which are essential to establish our results. 

Assumption 4.1. There exists a constant C > 0 such that Aniin(R) > C- 

Assumption 4.1 says that the minimum eigenvalue of R is lower bounded away from 0. 
This assumption is a common condition to guarantee non-degeneracy of U-statistics (Van der 
Vaart, 1998). It guarantees the validity of using F © G to approximate V£„(0*). In order to 
derive the asymptotic normality of Sni®*), we require the following assumption. 

Assumption 4.2. There exists a constant > 0 such that l/iy < Amin(El*) < Aniax(El*) < zz. 
In addition, there exists a constant K-^* > 0, such that ||S*||oo < K-s*. 

The hrst part of Assumption 4.2 requires that the smallest eigenvalue of the correlation 
S* is bounded below from zero, and its largest eigenvalue is finite. This implies that 
Ijv < Aniin(0*) < A max (0*) < V- This assumptions is commonly imposed in the literature 
for the analysis of Gaussian graphical models (Ravikumar et ah, 2011; Yuan, 2010; Cai et ah, 
2011) and nonparanormal models (Liu et ah, 2009, 2012; Xue and Zou, 2012). The second 
part of Assumption 4.2 has been made in Ravikumar et ah (2011); Jankova and van de 
Geer (2013). For example, this assumption is satisfied in the Toplitz covariance matrix, i.e., 
where [y] < 1. 

At the core of our proof, we show that y/n{^Sn{££, 0(jfc)c) — S'n(0, 0^^.^^^)) = op(l). To show 
this, we also need the following assumption on the estimator error for the precision matrix 
estimator 0. 
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Assumption 4.3. The estimator © satisfies 


II® “ ®*llmax = Op(\/log d/n), ||0 - 0*11^ = Op(sx/log d/n). 

Assumption 4.3 essentially requires © has sufficiently fast convergence rates in terms 
of both elementwise infinity norm and matrix norm, which is the key to show that the 
remainder term diminishes asymptotically. It can be shown that for any precision matrix 0* 
belonging to the class of matrices U{s,M) for some constant M, CLIME (Cai et ah, 2011) 
estimator enjoys all these theoretical guarantees. 

Remark 4.4. We can also use graph Dantzig selector and neighborhood selection estimators 
to estimate the precision matrix of nonparanormal graphical models. However, they do not 
enjoy optimal estimation rates in terms of matrix elementwise max norm. Although this 
norm can be rudely bounded by spectral norm, it results in suboptimal scaling for hypothesis 
test. Similarly, the graphical Lasso estimator does not have estimation guarantees in terms 
of the matrix ii norm by only making assumptions on the smallest eigenvalue of 5]*. To 
attain matrix ii norm based estimation error bounds, we need to make substantially stronger 
assumption such as inrrepresentable condition (Ravikumar et ah, 2011). Thus, we use the 
CLIME estimator. 

Assumption 4.5. It holds that lim^^oo slogd/n^^^ = 0. 

Remark 4.6. Assumption 4.5 is a mild assumption, which says logd = o(n^/^/s). Recall 
that for parameter estimation, we require logd = o{n/s^). This implies that in order to make 
the pseudo score test a valid procedure, we require that d goes to infinity at a relatively 
slower rate. Moreover, we comment that our condition logd = o{n}f‘^/s) is weaker than that 
in Jankova and van de Geer (2013) and already matches the best possible scaling under the 
Gaussian graphical model (Ren et ah, 2013; Liu, 2013). 

Equipped with Assumptions 4.1-4.5, we are ready to present the main result for score 
test, which establishes the asymptotic normality of the estimated decorrelated score function 
in (3.8). 

Theorem 4.7. Under Assumptions 4.1, 4.2, 4.3 and 4.5, as n, d —)> oo, we have 
\/n5n(O,0Qfc)c) = V^5„(0, 0y^)c) +op(l) iV(0,4o-2), 

where = RQ-fc),(jfc) - 2R(jfc),(jfc)cW* -h w*'^RQ-fc)c_(jfc)cW*. 

By Theorem 4.7, to make \/n5n(0, 0Qfc)c) a valid test statistic, we need to estimate its 
asymptotic variance which depends on the unknown matrix R and w*. Based on the 
definition of in Theorem 4.7, we can estimate by plugging in R and w. This leads to 
the following estimator: 




where R is defined as follows 


R=-^vec(F©G*)vec(F©G*). (4.4) 

” i=i 

Recall that G* = [GjJ E is defined as = l/(n-l) h%{@) and F = [Fik] E 

is defined as — It is easy to show that in (4.3) is a consistent estimator of 

cj^. This, together with Theorem 4.7, establishes the validity of the score test statistic in (3.9) 
under the null hypothesis. 

Corollary 4.8. Under the same assumptions of Theorem 4.7, we have 

lim P(4'5 (q:) = l|Ro) = « and Ps ^ U[0,1], 

n^oo 

where U[0,1] is a random variable with a uniform distribution on [0,1]. 

4.2 Asymptotic Power Under Local Alternative Hypotheses 

In this subsection, we analyze the power of the score test for detecting the alternative 
hypothesis. More specifically, we are interested in the limiting behavior of ST„ under a 
sequence of alternative hypotheses Hin : Qjk = K ■ n~^, where A is a constant, and ry is a 
positive constant. We consider the following parameter space rj, s*, M*): 

d 

= I© : 0 © O,0jfc = K ■n-^,\\&\\i < M*, max ^l{Qjk / 0) < s*|, 

where s* = iiiaxi<j<d ^ 0) The parameter space77p^^(A, ry, s*, M*) 

characterizes the local alternative hypotheses around the null hypothesis Qjk = 0, in the sense 
that Qjk = A • goes to 0 as n —>■ oo. To ensure that the parameters are estimable under 
the alternatives, we focus on the sparse local alternative hypotheses. In the rest of this paper, 
for notational simplicity, we drop out the superscript (jk) in when it can be inferred 
from context. 

In the following assumption, we assume that the estimator 0 has desirable estimation 
error rates for any 0 in Ui{K, rj, s*, U*) uniformly. 

Assumption 4.9. The estimator © satisfies 

lim inf P©(||© - 0|| < Goy^bg"^) = 1, 

n->-oo ©gWi(/s:,??,s*,M*) V" Umax J 

lim inf P©f||0 — ©IL < C\S\/\ogd/n\ = 1, 

n^oo&QUi{K,ri,s*,M*) V" " / 

where Co,Ci are positive constants. 
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Assumption 4.9 is analogous to Assumption 4.3, which holds for the CLIME estimator; see 
Cai et al. (2011). Now given Assumptions 4.1,4 .2 and 4.9, we are able to prove the following 
theorem, which characterizes the uniform limiting distributions of the score test statistic ST„ 
in (3.9) for 0 \nUi{K,r], s\U*). 


Theorem 4.10. Under Assumptions 4.1, 4.2, 4.9, and suppose that we have y/n s\ogd/n + 
K ■ n~'^sjlogdjn = o(l), if n is sufficiently large, it holds that 


lim sup sup |P©(ST„ < t) — <h(t)| = 0, if ry > 1/2, (4.5) 

n^oo o^Ui{K,ri,s*,M*) tSR 

lim sup sup 

n^oo ,M*) iSR 

lim sup sup |P©(ST„ < t)| = 0, if ry < 1/2. (4.7) 

In fact, this theorem also implies the uniform convergence of ST^ under the null hypothesis. 
This is seen by taking AT = 0 in (4.6). Recall that the power of the score test 'I' 5 (a) is defined 
to be the probability of 'I' 5 (a) = 1 when 0 G Ui{K,r], s*, M*). Recall the fact that the 
type I error of 'I' 5 (a;) can be controlled at level a asymptotically as shown in Corollary 4.8. 
Theorem 4.10 immediately characterizes the uniform asymptotic power of 'I' 5 (q:) under the 
alternative hypothesis Hin ■ Qjk = Kn~^. In particular. Theorem 4.10 implies that 

lim sup sup |P©('I' 5 (a) = l) — a| = 0 , if ?y > 1 / 2 , 

&eUi{K,ri,s*,M*) a6(0,l) 

lim sup sup |P©('I' 5 (a) = l) — '!/’a| = 0 ) if ry = 1 / 2 , 

n^oo a6(0,l) 

where V'a = 1 - 4>(4>"^(1 - a/2) + KH^jk)\(jk)<^/{2a)) + 4>(- 4>-i(l - a/2) + KHQk)\(jk)^/ ( 2 cr)), 
and for a G [<5,1) with d > 0, we have 

lim inf |P© (^ 5 ( 0 ) = l) I = 1 , if 7 y < 1 / 2 . (4-10) 

More specifically, (4.8) implies that the score test $ 5 ( 0 ) has no power beyond the type I 
error to distinguish Hq from Hin if ry > 1/2. In addition, when ry = 1/2, since r/o, > a for 
any AT 7 ^ 0, (4.9) indicates that 'L 5 (a) has asymptotic power larger than the type I error for 
detecting the alternative hypothesis : Qjk = We plot r/o in Figure 1. We can see 

that the larger K is, the larger power the pseudo score test can achieve. In other words, our 
proposed pseudo score test attains larger power when the magnitude of 0*^ is large. Lastly, 
(4.10) implies that, if ry < 1/2, the minimal power of 'I' 5 (a) goes to 1 as n ^ 00 . 


(4.8) 

(4.9) 


P©(ST„ <t)-$ t + AT 




( jk)\{jkY 

2a 


= 0, if ry = 1/2, (4.6) 


4.3 Theoretical Property of Confidence Interval 

In this subsection, we present the main theoretical result of the confidence interval and the 
related Wald test. 
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K 

Figure 1: The plot of 'i/’o = 1 — — a/2) + + <h( — — a/2) + K^, when 

a = 0.05 and a = 0.1. The x-axis is the value of and the y-axis is the value of i/a- 


Theorem 4.11. Under the same assumptions of Theorem 4.7, we have 


+ o,(i) jv(o, 

where is the partial information 

matrix for 0* 


jk- 


The asymptotic variance of the pseudo-likelihood estimator is 4 cj^ • 77^7^^where both 
cj^ and -^(jfc)|(jfc)= unknown. To apply Theorem 4.11, we need consistent estimators for 
and 77(j7j)|(jfc)c. For we have already provided a consistent estimator as in (4.3). For 
77(jfc)l(jfc)c, we use = l/(0jj0fcfc). To summarize, the following corollary indicates 

that the asymptotic variance of \/n(0^ — 0*^) can be consistently estimated. 

Corollary 4.12. Under the same assumptions of Theorem 4.11, we have 

{2a)-^H^^k)\(jkrV^{QYk - ©ifc) - NiO, 1), 


where l/(0jj0^^). 

Corollary 4.12 verifies the validity of the confidence interval (3.12). Furthermore, the 
following corollary shows that under Hq, type I error of the Wald test Tu/(a) dehned in (3.14) 
converges to the signihcance level and the p-value is asymptotically uniformly distributed. 

Corollary 4.13. Under the same assumptions of Theorem 4.11, we have 


lim P('kiy(a) = l|77o) = a and Pw U[0, 1], 

n^oo 

where Pw = 2(l — <I>(|Wn|)) is the p-value of the Wald test. 
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4.4 Theoretical Property of Confidence Subgraph 

We present the theoretical property of our confidence subgraph. Recall that the global 
test statistic in (3.15) is constructed based on Wald test statistic, so the global inference 
of conhdence subgraph is closely related to the Wald test. As described in Section 3, we 
use the a-quantile of W conditioned on as an estimator of the a-quantile of T, i.e., 

cw{<y) = infjt E M : Fe(W < t) > a}. We now lay out the main theorem for the conhdence 
subgraph G. 

Theorem 4.14. Under the same assumptions of Theorem 4.11, we have 

lim sup |P(r < cw{cx)) — a| = 0, 

and the conhdence subgraph G satishes lim„_>.oo ^(<1? C G*) > 1 — a. 

5 Numerical Experiments 

We verify our theoretical result using both synthetic and real-world datasets empirically in this 
section. In all the experiments, we use the CLIME estimator implemented in the R package 
clime (ver. 0.4.1). For synthetic datasets, we use the R package huge (ver. 1.2.6) to generate 
samples with different graph structures. 

5.1 Synthetic Datasets 

In our numerical simulations, we consider 2 settings: (i) n = 100, d = 100; (ii) n = 100, d = 400, 
and generate data from nonparanormal distribution X ~ NPN{{,'S). For the monotonic 
transformations f, we consider 2 settings: (i) extended square root function f~^(x) = 
(sign(x)|x|^/^)/Y ^f \t\4>{t)dt; (ii) cubic function = {x^)/^Jf t^4>{t)dt, where 4>{t) is the 

marginal density function. For the choice of S, we construct 3 different graph structures: 
scale-free graph, hub graph and band3 graph. 

The detailed procedures for generating the 3 kinds of graphical models are as follows. 
Scale-free graph. The degree distribution of the sacle-free graph follows a power law. The 
graph is generated by the preferential attachment mechanism. The graph begins with an 
initial small chain graph of 2 nodes. New nodes are added to the graph one at a time. Each 
new node is connected to one existing node with a probability that is proportional to the 
number of degrees that the existing node already has. Formally, the probability pi that the 
new node is connected to an existing node i is, pi = kj), where ki is the degree of the 

node i. The resulting graph has d edges {d = 100 or d = 400). Once the graph is obtained, we 
generate an adjacency matrix A by setting the nonzero off-diagonal elements to be 0.3 and 
the diagonal elements to be 0. We calculate its smallest eigenvalue Amin (A). The precision 
matrix is constructed as © = A -1- (|Amin(A)| -|- 0.2) • D. The covariance matrix S := 0" ^ is 
then computed to generate the multivariate normal data: ..., ~ Xfi{0, S). The data 

are obtained by applying the marginal transformations to si, 
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Hub graph. The d nodes are evenly partitioned into d/20 disjoint groups with each group 
containing 20 nodes. Within each group, one node is selected as the hub and we add edges 
between the hub and the other 19 nodes in that group. The resulting graph has 190 edges 
when d = 200 and 380 edges when d = 400. 

Bands graph. Each node is associated with a coordinate j with j = 1, ...,d. Two nodes are 
connected by an edge whenever the corresponding coordinates are at distance less than or 
equal to 3. The resulting graph has approximately 294 edges when d = 100 and 1,194 edges 
when d = 400. 

We apply the proposed Score and Wald tests on the simulated nonparanormal data. In 
particular, We use CLIME for parameter estimation and use 5-fold cross validation to select 
the regularization parameter A. We compute the type I errors at the 0.05 and O.IO significance 
level, and compare them with the results given by the desparsity method (Jankova and van de 
Geer, 2013) which assumes that the data come from a Gaussian graphical model. The methods 
by Ren et al. (2013) and Liu (2013) yield similar results to Jankova and van de Geer (2013) 
and we only report the results by Jankova and van de Geer (2013). We repeat the tests for 
500 times. Table 1 and Table 2 show the type I errors of Score, Wald tests and the desparsity 
method. 


Table 1: Type I errors of Score, Wald and desparsity methods, with extended square root 
transformation function. 




scale-free 

h 

ub 

bands 


significance level 

d= 100 

d = 400 

O 

O 

II 

d = 400 

O 

O 

1—1 

II 

d = 400 

Score 

0.05 

0.055 

0.060 

0.050 

0.045 

0.050 

0.055 

Wald 

0.05 

0.045 

0.045 

0.050 

0.055 

0.050 

0.045 

desparsity 

0.05 

0.120 

0.115 

0.075 

0.075 

0.105 

0.090 

Score 

0.10 

0.090 

0.115 

0.105 

0.095 

0.100 

0.095 

Wald 

0.10 

0.095 

0.110 

0.100 

0.105 

0.090 

0.100 

desparsity 

0.10 

0.170 

0.205 

0.155 

0.175 

0.160 

0.155 


We now compare the power of the hypothesis tests at 0.05 significance level. We use the 
scale-free graph for a case study. Similar results can be observed in hub and band3 graphs as 
well. In particular, we generate the precision and covariance matrices by the same procedure 
as in scale-free model. To show the power curve, we randomly select one edge in the graph, and 
change its weight from 0 to 0.8 incrementally. Each time we generate n = 200 data samples 
from the nonparanormal graphical model with the extended square root transformation. Then 
we apply different test methods on the generated data. We repeat each experiment for 500 
times and record the averaged power. The power curves are shown in Figure 2. 

From these simulation results, we see that our methods achieve accurate type I errors 
and are more powerful than the desparsity method. In comparison, the desparsity method 
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Table 2: Type I errors of Score, Wald and desparsity methods, with cubic transformation 
function. 




scale-free 

h 

ub 

bands 


significance level 

d= 100 

d = 400 

O 

O 

t^ 

II 

d = 400 

O 

O 

1—1 

II 

d = 400 

Score 

0.05 

0.055 

0.055 

0.045 

0.055 

0.050 

0.055 

Wald 

0.05 

0.040 

0.045 

0.050 

0.066 

0.050 

0.060 

desparsity 

0.05 

0.150 

0.145 

0.075 

0.075 

0.105 

0.110 

Score 

0.10 

0.095 

0.115 

0.105 

0.095 

0.100 

0.095 

Wald 

0.10 

0.095 

0.110 

0.110 

0.100 

0.095 

0.090 

desparsity 

0.10 

0.210 

0.235 

0.160 

0.160 

0.215 

0.190 


does not control the type I error well, which is not surprising because it is mainly designed 
for Gaussian graphical models. In summary, the proposed hypothesis testing methods yield 
accurate testing results, and outperform the existing methods. 


power curves, extended square root transformation 



(a) 


Figure 2: Power curves for testing Hq : Qjk 
and (b) is for d = 200. 


power curves, cubic transformation 



Signal 

(b) 

0 at 0.05 signihcance level, (a) is for d = 100 


5.2 Real-world Datasets 

In this section, we apply our method to a human gene expression dataset. The dataset can 
be found in the R package BDgraph. It contains 60 unrelated individuals of Northern and 
Western European ancestry from Utah (CEU), whose genotypes are available from the Sanger 


18 
























Institute website (ftp://ftp.sanger.ac.uk/pub/genevar). In this project, Illuminas Sentrix 
Human-6 Expression BeadChips are used to measure gene expression in Blymphocyte cells for 
all the individuals (Stranger et al., 2007). The genotypes for rare homozygous, heterozygous 
and homozygous common alleles are coded by 0, 1, and 2, respectively. The raw data are 
first background corrected and then quantile normalized across four replicates of a single 
individual. Finally, they are median normalized across all individuals. We choose the 100 most 
variable probes among the 47,293 probes corresponding to different Illumina TargetID. Each 
selected probe corresponds to a different transcript (Mohammadi and Wit, 2012). The more 
detailed discussion on this dataset can be found in Bhadra and Mallick (2013); Mohammadi 
and Wit (2012); Chen et al. (2008); Stranger et al. (2007). Our goal is to infer the signihcant 
interactions between gene transcripts. 

We model the human gene expression dataset using the nonparanormal graphical model, 
with sample size n = 60 and dimension d = 100. We use the CLIME estimator with the 
regularization parameter selected by 5-fold cross validation for precision matrix estimation. 
For each gene pair, the null hypothesis is that these two variables are conditionally independent 
given the rest of variables, and we apply the score test with 0.05 significance level. We present 
the resulting graph in Figure 3. The p-values are shown on the edges. Most of the obtained 
edges meet the results in existing works (Bhadra and Mallick, 2013). For example, the 
connected component {GI_41190507S, Hs.512137S, Hs.512124S, Hs.449605S, GL37546969S} 
reveals an interaction structure that is similar to the structure discovered by Bhadra and 
Mallick (2013). In addition, our method also finds some new gene-gene interactions, such as 
gene pairs hmm9615S and GI_21614524S, GL21614524S and GI_16554578S. 

We also construct the confidence subgraph for the gene expression dataset. The estimation 
procedure for 0 is the same. We use the Monte-Garlo method to compute the quantile of the 
bootstrap estimator W defined in (3.17). The random sampling is repeated for 1000 times. 
Figure 4 shows the conhdence subgraph. In comparison with the local test result in Figure 3, 
the confidence subgraph has less edges. This is not surprising because the confidence interval 
of each 0*^ given by the global inference is wider than the confidence interval given by the 
local inference due to the adjustment for the multiplicity of tests. Thus, more conservative 
results are given in Figure 4. 

In conclusion, our method recovers many well-established human gene interactions and it 
also identihes some additional gene interactions potentially for future scientific investigations. 

6 Conclusions 

In this paper, we study both local and global asymptotic inference for the nonparanormal 
graphical model. For local inference, we aim at testing the presence of a single edge. In 
particular, we propose the score test and Wald test. We provide the theoretical guarantees on 
the type I error as well as the local power of the score test. We also construct the conhdence 
interval for each edge based on the Wald test statistic. For global inference, we construct a 
conhdence subgraph. The asymptotic property of the conhdence subgraph is also established. 
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Figure 3: The inferred graph for the human gene expression data set. For each gene pair, we 
apply the score test with 0.05 significance level. The p-values are shown on the edges. 


The numerical results show that our proposed methods outperform the existing ones. 
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A Proof of Lemma 3.1 

Proof. By permutation, we can have 

_ ^(jk),{jk) ^{jk),{jkp 

. ^Ukp,{jk) ^ 

In order to compute = H“^, we can use the following blockwise matrix inversion iden¬ 
tity (Golub and Loan, 1996): 


A 

B 

-1 

(A-BD-^C)-! 

-(A-BD-iC)-iBD 1 

C 

D 


-D-iC(A-BD-iC)^i 

D 1 + D i(A - BD-iC) ^BD 1 
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Figure 4: The graph for the human gene expression data set given by global graph inference 
with 0.05 significance level. 

Let A B C and D It can be seen 

from (A.l) that 

^(3k),(jk) — ^^(jk),{jk) ~'^(jk),{jkY\^{jkY,(jkY\ ^{jkYAjk)) (^- 2 ) 

^UkY,ijk) — ^ijkY,ijk}i^ijk),(jk) “ ) 

(A.3) 

Therefore, dividing (A.3) by (A.2), we obtain 

^{jkYXjk) [-tt ] —Itt _ _ 

O ~ y^a^YAjkYi ^{jkY,{jk) — W , 

which immediately yields w* = —^lf^jj^Y^{jk)/^{jk),{jk)- Similarly, we can show that w = 

~^{jkY,{jk)/^(jk),{jk)- n 

B Proof of Results in Section 4 

B.l Proof of Theorem 4.7 

In order to prove Theorem 4.7, we first lay out several key technical lemmas. The first lemma 
establishes the limiting distribution of the decorrelated score function y7i5,i(0, in the 
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high-dimensional setting, which is asymptotically normal. It is the key for establishing the 
asymptotic normality of y/nSn{0, 0Qfc)c). 

Lemma B.l. Under Assumptions 4.1 and 4.2, we have 


where = R(jfc),(jfc) - ‘2^{jk),{jkr^* + ■ 

Proof. See Appendix C.l for a detailed proof. □ 

Lemma B.l not only shows the asymptotic distribution of ^/nSniO, ®*Qk)^) normal, but 
also gives rise to the variance of the limiting distribution. In particular, the variance of the 
limiting normal distribution is determined by R and w*. The following lemma shows that 
the Hessian of the negative log-pseudo likelihood at 0 is well-concentrated at the Hessian of 
the negative log-pseudo likelihood at 0*, in terms of matrix elementwise sup-norm. 

Lemma B.2. Suppose 0* G U{s,M). Under Assumptions 4.2 and 4.3, if n is sufficiently 
large, we have 


Proof. See Appendix C.2 for a detailed proof. □ 

The following lemma provides an £i norm based estimation error for the estimator w(0) 
in (3.6). 

Lemma B.3. Suppose 0* G U{s,M). Under Assumptions 4.2 and 4.3, if n is sufficiently 
large, we have 


|w(0) — w*||^ = 


jlogd 


+ s 


n 


logd 


n 


Proof. See Appendix C.3 for a detailed proof. □ 

In order to show the asymptotic normality of S'n(0), we also need Assumption 4.5, which 
together with Lemma B.2 and B.3 guarantee that 0(jfc)c) — S'n(0, 0*^.^^^)) = op(l). 

Proof of Theorem 4-'£- In this proof, we use w as the shorthand for W(j^)(0). We have 

5'n(0, 0(jfc)c) = —Tijk + [0 ^]jk — w'''vec( —-|- [0 

= + [@*-^hk - (w*)^vec(-S(,.,)c + [0*-']o-fc)c) - [G*-^]jk + 

+ (w* - w)’^vec(-SQ-fc)c -h [0"^](jfc)c) - (w*)’^vec([0~^](jfc)c - [0*"^](jfc)c). 
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The right hand side of the above equation can be reorganized as follows 


Sn{0,®(jkr) = Sn{0,&ljkr) + (w* - w)"^vec(-SQ 7 ,)c + [©■^](jfc)c) 

(i) 

-[®*~^]jk + - (w*)"^vec([0"^](jfc)c - . (B.l) 

'-V-' 

(ii) 

In the following, we are going to bound terms (i) and (ii) respectively. 

Bounding term (i): We have that 

(i) < ||w - w*||i • ||vec(-S(jfc)c + [©■^](jfc)-)|L ^ l|w - w*||i • II - S + (B.2) 


where the first inequality follows from Holder’s inequality, and the second inequality follows 
from ||vec(A)||oo = ||A||jnax- For the right hand side of (B.2), by Lemma B.3, we have 


|w — w*||i = Of 


jlogd 


n 



(B.3) 


Furthermore, we have ||—51 + 0 ^|| <|| — S + 5]*||max + II — 0* ^ + 0 ^|| . Since 

II — ^ + S*||^^^ = Of(^^/logd/nj, together with Lemma B.2, we can obtain that 

II -S + 0-^||^^^ = Op(Vlogd/n). (B.4) 

Furthermore, plugging (B.3) and (B.4) into (B.2), we get 

= (B,5) 

according to Assumption 4.5. 

Bounding term (ii): Consider the Taylor expansion of 0 at 0*, we have 


0-1 = 0*-i - 0*-1A0*"^ - (0*-1A)2^(-1)^(0*"^A)^0*"^ 

k=0 

= 0*-i - 0*-1A0*-i -Ri(A), (B.6) 

where Ri(A) = (0*-iA)2j0*-i = (0*-iA)R(A), R(A) = 
and J = J2T=oi~^)^i®*~^ ■ Fy (B.6), we have 

[0-1 - 0*-i]^., = -[0*-i],,A[0*-i],, - [Ri(A)]^., 

= -([©*-']a:* ® [©*-ib*)vec(A) - [Ri(A)] 

= -(0*-i ® ©*-!)(,■,),,vec(A) - [Ri(A)] (B.7) 
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where the second equality follows from a^Bc = (c"'' (8) a''')vec(B). Since ^{jk) = 0) the right 
hand side of (B.7) can be further rewritten as 


[0-1 _ = _(0*-i ® 0*-i)((^.,).vec(A(,-fc)c) - [Ri(A)] (B.8) 

By (B.6), we can also get 

vec([0-^ -©*"^]Qfc)c) = -vec([0*"^A0*"^](jfc)c) - vec([Ri(A)](jfc)c) 

= -(0*"^ (g) 0*"^)(jfc)c_^vec(A) - vec([Ri(A)]Qfc)c), (B.9) 

where the second equality follows from vec(ABC) = (C"'" (g) A)vec(B). Again, since ^(jk) = 0, 
the right hand side of (B.9) can be further rewritten as 

vec([0~^ - e*-\jkr) = (g) 0*”^)Qfc)c_Qfc)cvec(A(jfc)c) - vec([Ri(A)]Qfc)c). 

(B.IO) 


Combining (B.8) and (B.IO), we obtain 

(ii) = -(0*"^ (g) 0*"^)Q-fc)_Q-fc)cvec(A(jfe)c) - [Ri(A)]^.^ 

- (w*,-(0*"^ (g) 0*"^)(jfc)c_Qfc)cvec(A(jfc)c) - vec([Ri(A)]Qfc)c)) 

iy^ ) R(jfc)°,(jfc)'^^®c(^(j7c)'^)) [Ri(^)]jfc 

(ii).a (ii).b 

+ (w*,vec([Ri(A)](jfe)c)) . 

'-V-" 

(ii).c 


By the definition of w* = \H.(^j},Y,(jk)A~^'^(jk),{jkY^ have (ii).a = 0. It remains to bound 
terms (ii).b and (ii).c. Recall that Ri(A) = (0*“^A)^J0*“^ = (0*“^A)R(A). In the proof 
of Lemma B.2, we have shown that 


|R(A) 


< a|*/2||a| 


Therefore, We have 

(ii).b < ||Ri(A)|Uax = ||(0*-^A)R(A)||^^^ < ||0*-iA||oo • ||R(A)|Uax 

^ II®* ^l|oo • ||A||oo • ||R(A)||max ^ /2||A||oo ' ||A||max) (t^-H) 


where the second inequality follows from ||AB||max < l|A||oo • ||B||max; the third inequality 
is due to ||AB||oo < ||A||oo • ||B||oo- and the forth inequality follows from Assumption 4.2. 
Since ||A||oo = ||A||i = Op(s-\/log d/n) and ||A||max = Op(-\/logd/n), we have (ii).b = 
Op(slogd/n). Similarly, we can show that 

(ii).c < ||w*||i • ||[Ri(A)](jfc)c||^ < ||w*||i • ||Ri(A)||max = Op(slogd/n), 
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since 


w 1 = 



( 

^(jk)4jk) 

1 




< 


IR* 0* I 


< 




Amin(0 


* \2 


where last inequality follows from 0* G U{s,M) and Assumption 4.2. Combining terms (ii).a, 
(ii).b and (ii).c, we obtain 


(ii) = Of> {s log d/n) = op(n 


(B.12) 


according to Assumption 4.5. Therefore, substituting (B.5) and (B.12) into (B.l), we get 
^/nSn{0, 0(jfc)c) = \/nSn{0, 0q-^^c) + op(l). Combining with Lemma B.l, we complete the 
proof. □ 

B.2 Consistency of 

In this subsection, we show that in (4.3) is a consistent estimator of a^. This, together 
with Theorem 4.7, establishes the validity of the score test statistic in (3.9). 

Theorem B.4. For cj^ and as defined in Theorem 4.7 and (4.3), under Assumptions 4.1, 4.2 
and 4.3, we have 


\a — a \ = 




3/2 


^logd llogd\ 

+ S -h S1 

n 


\ n J ' n ' \ n J 

Before we prove Theorem B.4, we need the following auxiliary lemma. 
Lemma B.5. Under Assumptions 4.1, 4.2 and 4.3, we have 


R-R 


= Op 


logd 


n 


Proof. See Appendix C.4 for a detailed proof. 
Proof of Theorem B.4. We have 

\a — cr I < +2 


(B.13) 

□ 


w 


(i) (ii) 

'-V-' 

(iii) 

It remains to bound terms (i), (ii) and (iii) respectively. 

Bounding term (i): By Lemma B.5, we have (i) = Op(y^log d/n). 
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Bounding term (ii): By the triangle inequality, term (ii) can be upper bounded by 




W 


(ii).a (ii)-t' 

We are going to bound terms (ii).a, (ii).b and (ii).c respectively. Term (ii).a is bounded by 

iiB TD II II- „,*ii , ^ogd\ 

Iloo • ||W W 111 (ypls I ^ ^ )> 

where the equality follows from Lemma B .5 and Lemma B. 3 . To bound term (ii).b, we have 

r•■^K^llT3 II II- *11 f 2^ogd /logd\ 

(n).b < ||%fc),(ifc)-IU • ||w-w ||i = Op(^s ~ + 

since ||R(j7j) ||oo = Op(l). To see this, we note that each element of R is a continuous 
function of the elements of H, and each element of S is bounded. To bound term (ii).c, we 
have 

(ii).c ^ ||w 111 • ||R^j^yQ-^^c Iloo Op('\/^log~d/n^, 

because ||w*||i < It follows immediately that (ii) = Op(s^(logd/n)^^^ + s\ogd/n). 

Bounding term (in): We have 


(hi) < |w'''(R, 




(iii).a 


(m).b 


logd 


n 


For term (iii).a, we have 

(iii).a ^ ll^lll * Umax ^P 

because ||w*||i < For term (iii).b, using the fact that 


Iv’^Wv - v’^Wvl < ||W||max ' ||v - v||i + 2||W||max ' ||v||l • ||v - v||i 


for any symmetric matrix W G and v, v G M“, we obtain 


(iii).b < ||R(jfc)c^Qfc)c||max • ||w - W*||i + 2||R(jfc)^(jfc)c||oo • ||w*||i • ||w - w*||i 




n 


I ? 

n J 


because ||R(jfc)c,(jfc)c||max = Op(l), l|R-(iA:),(jfc)'=||oo = Op(l) and ||w*||i < It follows 
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that 


(iii) = Op ( s 


alogd , _ llogd\ 


+ s 


)■ 


n \ n 

Combining terms (i), (ii) and (iii) together, we complete the proof. 

B.3 Proof of Corollary 4.8 

Proof. By Theorem 4.7 and Theorem B.4, we have 


(B.14) 

□ 


ST„ = y/nSn{0, 0Qfc)c)/CT iV(0,1). 

Thus, we haveP('I' 5 (a) = l|i7o) = IP(ST^ > Xia) = Similarly, for any t G (0,1), we have 
P(Ps<t)=p(^$(ST„)>l- 
This completes the proof. □ 


- ) = P( ST„ > $- 


1 - - 

2 


t. 


B.4 Proof of Theorem 4.10 

In order to prove Theorem 4.10, we first lay out several auxiliary lemmas as follows. 

First, we have the following result, which essentially guarantees uniformly local asymptotic 
normality condition for Qjk- 

Lemma B.6. Under Assumption 4.2, we have 


lim inf P© 

n^oo ®£Ui 






^jk^ijk)\{jkp 


V n 


= 1 , 


where = lli{K,r], s*, M*), = [QjjQkk + Q%] /{Qjj&kk)‘^ and Cq > 0 is a positive 

constant. 

Proof. See Appendix C.5 for a detailed proof. □ 

Lemma B.7. Under Assumptions 4.1 and 4.2, we have 


lim inf P© 

n^oa &eUi{K,ri,s*,M*) 


IR-RII 



= 1 , 


(B.15) 


where Cg is a positive constant. 

Proof. See Appendix C.6 for a detailed proof. □ 

In addition, we have the uniform central limit theorem holds in Ui{K, r], s*, M*). 
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Lemma B.8. Under Assumptions 4.1 and 4.2, we have 


lim sup sup 

®£Ui{K,r],s*,M*) tm 


P© 



(b^Rb)’ 


2 b'''vec 



4>(t) 


= 0 , 


where b = [1, —(w)"'"]''". 

Proof. See Appendix C.7 for a detailed proof. □ 

We have that the estimation error bound of w(0) holds uniformly in for any 0 in 

Lemma B.9. Under Assumptions 4.2 and 4.9, if n is sufficiently large, the estimator w 
satisfies 


lim inf P© 

n^oa &eUi{K,'q,s*,M*) 


|w(0) — w||i < Cs( S 


n 


+ s 


logd\ 


n 


= 1 , 


where w = —(0 0 i® ® C 3 is a positive constant. 

Proof. See Appendix C.8 for a detailed proof. 


□ 


The following lemma shows that the uniform convergence of the Hessian matrix can be 
attained under Assumptions 4.2 and 4.9. 


Lemma B.IO. Under Assumptions 4.2 and 4.9, if n is sufficiently large, we have 

I log d \ 


lim inf P©( ||0 ^ - 0 ^||max < Cs 

n^oo &£Ui{K,ri,s*,M*) 


= u 


lim inf P© —51 + 0 

n^oo &eUi{K,'q,s*,M*) 


- 1 | 


<Ce 


n J 

)-■ 


logdN 


n 


(B.16) 

(B.17) 


where H = 0“^ (g) 0“^, C 5 and Cq are constants. 

Proof. See Appendix C.9 for a detailed proof. □ 

Lemma B.ll. Under Assumptions 4.1, 4.2 and 4.9, we have 


lim inf P© 

n—^oo ©gWi 


Sn{0,@ijkr) - Sn{@) + 


< CsCes^ 


n J n 


= 1 , 

(B.18) 


where = Ki{K,r}, s*, M*), = [©jjQfcfc + ©jJand Ca and Ce 

are the same as the constants in Lemmas B.9 and B.IO, (7 is a constant. If in addition 
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Assumption 4.5 holds, we have 


lim sup sup 


lim sup sup 

n^oo e,^i(i(K,ri,s*,M*) * 6 ® 


¥&^^/nSn{0,&(jk)<=)/{2a)<t\-^{t) =0, if ?? > 1/2, (B.19) 


© 


n5n(O,0(jfc)c)/(2(T) <t) t + K—j 




= 0 , 


(B.20) 


if 7? = 1 / 2 , 

and for any fixed f G M and iT 7 ^ 0 , we have 

lim sup supP© f|-v/n5'n(0, ©(,Mc)/(2cr)| < f) = 0, if ?? < 1/2. (B.21) 


n^oo &^iii{K,ri,s*,M*) iSR 
Proof. See Appendix C.IO for a detailed proof. 
Proof of Theorem 4-^0. Define the following events: 


□ 


£3 = |||w(0) - w||^ < C 3 

£ 3=1 HR- — R||max < C's 


2 logd logd\ 
s - h S' ' ' 


n 


I ( ’ 
n J I 


n / 


By the similar proof of Theorem B.4, and invoking Lemma B.9, we have 


inf P© 

©eWo(s*,M*) 


a'^ — a^\ < Cl s 


2 logd /logd\ 


V n y 


+ + 
n 


n J 


> inf P©(T 3 nT 8 )>l- sup P©(T 3 ) - sup P©(T 8 )-^ 1 - 

&eUi{K,r],s*,M*) &£Ui{K,ri,s*,M*) &&Ui{K,ri,s* ,M*) 

Let Un = ST„ = y/nSn{f^,@[jkY) /{2d), and Un = y/nSn{f^,@[jkY) /{2a), For any t and a 
positive sequence 5^ —^ 0 , we have 

P©(ST„ < t) - 4> = ¥&{Un<t)-¥&{Un<tP5n\ 

(i) 

+ P© ( Un < t + ^ \ t p K -h (5^ ) + ( t + iL h <5^ ) — t p K -— ) 


2cj 


2cj 


2fT / 


(ii) (iii) 

Bounding term (i): By triangle inequality, we have 

sup |p©(//n <£) - P©(t^n < i + (5n)| < P(|^n “ Un\ > 6n) = P(|C4i| 'll - a/u| > 6n) 

<F{\Un\ > 1/6n) PF{\1 - a/a\>5^n)- 


(i).a 


(i).b 
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The term (i).a can be further bounded by 


( 


> cMI +ipf 

1 

1 

1 

V 

2a 

- n j\ y 

2a 


IP©(|t/n| > l/5n) < |lP©(|?7n| >5;:^) 
where Z is a standard normal random variable. By Lemma B.ll, we have 


>^n\, 


lim sup sup sup 

n—>-oo &£Ui{K,ri,s*,M*) 5n 


r&{\Un\>5-^)- 


( 

z-aT 

>Sn") 

V 

2a 

— n I 


= 0 . 


Since the tail bound for the standard normal distribution yields P(|Z — Ka‘^/{2a)\ >6 ^^) < 
^i\Z\ > — Ka'^I{2a)) < 2/^2'K{bn^ — Ka"^l{2a)) exp [ — — Ka’^j (2it))^/ 2] —)■ 0 as 

bn —> 0, we have P(|?7n| > l/<5n) < 0. For term (i).b, we have 


l-a/a\ >bl) = ¥&(^a'^ - a'^\/{{a + a)d) > 


Let rjn = s^M^(log(i/n)^/^ + s^M®logd/n + sM^y^log d/n. Given events T 3 and Tg, and the 
assumption that a > k, there exists a constant C such that \a‘^ — cj^|/(( a + a)a') < Crjn, since 
a > K — Tjn > k/ 2 for sufficiently large n. Hence, by setting bn = Crjn , for some sufficiently 
large constant C, we have P©(|1 — cr/a\ > bn) = 0. Thus, we obtain 

lim sup sup supP©(17n ^ t) ~ ^&{Un < t + bn) < 0. (B.22) 

n^oo &£Ui{K,'n,s*,M*) tSK 

Bounding Term (ii): By Lemma B.ll, we have 

lim sup sup sup ■[p©(t/n < i + iL-—h 1 < 0. (B.23) 

n^oo &£Ui{K,'n,s*,M*) tm [ V 2(7 / J 

Bounding Term (iii): We have <h(t + iL(T^/(2cr) + (in) — *^(1 +/L(T^/(2cr)) < which 

immediately implies that 

lim sup sup sxvp i^{t + Ka^ / {2a) + bn) — + Ka^ / {2a)')\ <Q. (B.24) 

n-s-oo &£Ui{K,ri,s*,M*) iSR ^ 


Combining (B.22), (B.23) and (B.24), we obtain that 

limsup sup sup |p©(ST„ < t) — <h(t + iL(T)| < 0. 
n-s-oo &£Ui{K,rj,s*,M*) iSR ^ 


By similar arguments, we can obtain that 


sup sup sup |p©(ST„ < t) — 4>(t + LC(t)| > 0. 

>■00 (^aUaK r, s* M*^ f 1 


lim 

n-^-oo &£Ui{K,rj,s*,M*) iSR 


This completes the proof of (4.6). 


□ 
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B.5 Proof of Theorem 4.11 

Proof. In this proof, we use w as the shorthand for w(0). By the dehnition of 0]^, we have 

- Q*jk) = - 0-fc) - \/^(V(,fc)540))-'5n(0). (B.25) 

Consider the Taylor expansion of Sn{®) at 0*^ as follows 

Sn{®) = SniQ'jk, ®ijkp) + (jk)Sn{Q{jk): ®{jkp)i®jk “ 0jfc)) (B.26) 

where Q{jk) lies in between 0*^ and Qjk- Substituting (B.26) into (B.25), we obtain 

= ^(®jk - e*k) - V^{^uk)Sni®))~"[Sn{e*k,&ukp) 

+ V(jfc)S'n( 0 (jfc), 0 (jfc)c)( 0 jfc — &*jk)] 

= -V^{Vyf,)Sni&)y^Snie*k,Qukr) 

'-V-' 

(i) 

+ VfiiQjk - Q*k) (l - yUk)Sn{&)y"V(jk)Sn{®Uk), ©w^)) • (B.27) 

'-V-' 

(ii) 

In the following, we are going to bound terms (i) and (ii) separately. 

Bounding Term (i): We first show that V(jfc)S'n(0) A By triangle inequality, 

we have 


|V(jfc)5n(0) < \'^(jk),{jkYn{®) ^{jk)(jk)\ + {jk),(jkp^‘n{® 


0)w-H 


{jk)(jky 


:W 


(iii) 


(iv) 


(B.28) 


By Lemma B.2, we have (iii) < ||V^I’n(0) — H|| ma v = Op(y^log d/n). By the triangle 
inequality, we have 

(iv) < l(V(jfc),O'fc)c4(0) - HQfc)_Q-fc)c)(w - w*)| + |HQ-fc)_Q-fc)c(w - w*)| 

'-V-^ '---^ 

(iv).a (iv).b 

+ “ ^{jk),ijkp)^*\ ■ (B.29) 

'-V-' 

(iv).c 

For term (iv).a, we obtain 

(iv).a< ||w-w*||^ • ||V^^.;i,) (^.fc)J(0)-H(^fc),Qfc)c|U 

where the inequality follows from Holder’s inequality, and the equality follows from Lemma B.3 
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and Lemma B.2. For term (iv).b, we have 


(iv).b < ||w - W*||^ • ||HQfc)^(jfc)c||max = , (B.31) 

where the inequality follows from Holder’s inequality, and the equality follows from Lemma B.3 
and ||H||max = Op(l). For term (iv).c, we have 

(iv).c < ||w*||i • - tim,uk)4oo = Op(/^), (b.32) 

where the first inequality follows from Holder’s inequality, and the second inequality follows 
from Lemma B.2 and ||w*||max < Substituting (B.30), (B.31) and (B.32) into (B.29), 

we obtain 


(iv) = 




(B.33) 


Hence, combining terms (iii) and (iv), and submitting it into (B.28), we have 

2 /logd\^^^ logd [\ogd\ 


^ijk)Sn{&) - H^jk)\ijkr\ = 1^ — j + S— + Y — J = Op(l) 


(B.34) 


according to Assumption 4.5. Therefore, by (B.34), we have V A By 

Theorem 4.7 and Slutsky’s theorem, we have 

(i) = + 0,(1) -- jv(o,4o=jf,-2,|,.„,). (b.ss) 

Bounding Term (ii): We aim to show that (ii) = op(l). In detail, we have 

(ii) < IV^iQjk - e*k )\• |i - {Vijk)Sn{Q)y^Vijk)Snieijk),&ukr)\ 

< VnlQjk - Q*k\ ■\{V(jk)Sn{®))~^\ ■ \VQk)Sn{®) - VQk)Sn{Qijk),®ijkr)\ ■ (B.36) 

-V-^ -V-^ ^-V-^ 

(ii).a 

For term (ii).a, we have (ii).a < ||0 — 0 *||max = Op(\/log d/n). For term (ii).b, we have 
shown in (B.34) that V(jfc)5„(0) A H(^jk)Hjk)<^- Therefore, we have (ii).b A = 0(1)- 

For term (ii).c, by triangle inequality, we have 

(ii).C ^ IVQ;j)5'fi(0) Hf^jk'^^(^jk'^c \ + (^jk)^ni®(jk)j ®(j'A:)'^)| • (B.37) 

'-V-" '-V-' 

h h 
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By similar argument as in (B.28), we can show that 




Submitting (B.38) into (B.37), we can get 




n 


(B.38) 




(B.39) 


Combining terms (i).a, (i).b and (i).c and substituting it back into (B.36), and invoking 
Assumption 4.5, we obtain 




3/2 


+ '^l=C»(l). 


n 


(B.40) 


Substituting (B.35) and (B.40) into (B.27), by Slustky’s theorem, we have 

This completes the proof. 


□ 


B.6 Proof of Corollary 4.12 

Proof. Note that by Theorem B.4, we have — cr^| = op(l). It is sufficient to show that 
^{jk)\{jkp ^{jk)\{jkp op(l). Note that 


\H, 


{3k)\{jkY ^{jk)\{jkY\ (jk),(jk)^n-{®) ^{jk){jk)\ + \^{jk),{jkY^'^^®^'^ ^ijk),ijky 


W 


(i) 


For term (i), according to Lemma B.2, we have 


(ii) 


(i) < ||V2<„(0) - H|L„ = Or 


logd 


n 


= Op(l). 


(B.41) 


(B.42) 


For term (ii), by similar argument as in (B.29), we can show that 


(B.43) 
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according to Assumption 4.5. Combining terms (i) and (ii) in (B.42) and (B.43) and submitting 
it into (B.41), we obtain 

\^{jk)\{jk)‘^ ~ — Op(l)- 

Applying Slustky’s theorem completes the proof. □ 


B.7 Proof of Corollary 4.13 

Proof. By Theorem 4.11 and Corollary 4.12, we have 

= l\Ho) = FiW^ > xla) = «• 


Similarly, for any t G (0,1), we have 

< i) = Ip($(lPn) > 1 - 0 
This completes the proof. 


Wn>^ 




□ 


B.8 Proof of Theorem 4.14 

In order to analyze the theoretical property of confidence subgraph, we introduce intermediate 
test statistics Tq, Ti and T 2 as follows. 


To = 


max 

(j,fc)e[rf]x[d] 




2=1 


(B.44) 


^ ^ - &*jk) ■ -f^(ifc)|(ife)-/(2o-), (B.45) 

[j,k)G[d\x[d\ \ / 

^2 = ,. r VnSn{®*)/{2a), (B.46) 

0,fc)e[d]x[d] 


where Zijk = —l/{2a) ■ b^vec(T © G*). Remind that 5^(0*) = b'''vec(T 0 G). Using Ti 
and T 2 as intermediate test statistics, we can establish the connection between Tq and T. 
Furthermore, we introduce another test statistic Wq as follows 


lUo 


max 

{j,k)£[d]x[d] 




ijk 


1 


2=1 


(B.47) 


where is a sequence of i.i.d. standard normal random variables that are independent 

of Note that both Tq and Wq have the structure of summation of independent 

random variables. In what follows, we are going to build connections among the quantiles of 
Tq, T, Wq, and W. We define the multiplier bootstrap estimator of the a-quantile of Tq as 


34 


the conditional a-quantile of Wq conditioned on that is, 

cwoioi) = inf{t G M : Pe(hFo <t)> a}, 

where Pe is the probability measure induced by the multiplier variables given fixed 

i.e., Pe(H^o <t) = P(l^o < 

In order to prove Theorem 4.14, we need a series of auxiliary lemmas. We first give a 
lemma, which is proved in Chernozhukov et al. (2012). 

Lemma B.12. (Chernozhukov et ah, 2012) Suppose that there exist constants 
Cl, C 2 , Cl, C 2 > 0 and a sequence Bn > 1, lim„_>.oo Bn —>■ 00 such that 


1 

2=1 

(B.48) 

1 "" 

max - -kE[exp(|Zjjfc|/B„)] < 4, 

’ 2=1 

(B.49) 


and Bn{log{dn)y/n < C 2 U for 1 < fc < re and {j,k) G E'^. Then there exists constants 
c > 0 and C > 0 depending only on ci, Ci, C 2 and C 2 such that 


sup |T’(ro < cuo{a)) — a| < Cn 
oe(o,i) 

Lemma B.12 shows that C[/p(a) is an estimator of the a-quantile of T. Note that in our 
model w* is sparse and Zijk is sub-exponential for all i,j, k. Setting Bn = Ci, we get 

1 ” 

> Cl and E[exp(|Zjjfc|/Ci)] < 2. 

^ • 1 

2=1 

This verifies the conditions (B.48) and (B.49). 

Lemma B.13. (Chernozhukov et ah, 2012) Let and 1^0 be centered Gaussian random 

vectors in with covariance matrices and respectively. Suppose that there are 

(2) 

some constants 0 < ci < Ci such that ci < < Ci for all j = 1,... ,p. Then there exists a 

constant C > 0 depending only on ci and Ci such that 


sup 

teK 


P( max < i) — 
i<j<p J 


( max < t) 
i<j<p J 


<CAy'(lVlog(p/Ao))2/^ 


where Aq = max^-^fc - E^.^^|. 

Let Uq maxfj fci pX 1/-\/rey~],_^ Unki where {yijk}(j,k)^[d]x[d],i=i,...,n Gaussian 

analogs of {Zijk}[j^k)&[d]x\d]-,'i = 1, • •.,re. Then we have the following result. 
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Lemma B.14. Suppose that there exist some constants 0 < ci < Ci such that ci < 

n Ylk=i ^[^ijk\ - O’j ^ M ^ [d]- Then for every a G (0,1), 

F{cwoia) < Cf/o(« + 7r('i9))) > 1 - P(A > ??), 

P(ct/o(a) < cwoia + 7r('(9))) > 1 - P(A > i}), 

where A = max(^j^k)^i^j,^k')(^[a]x[d]\^T,i=i{ZijkZifk’-F[ZijkZij>k'])\,TT{'&) = C'2??^/^(lVlog(|^''|/r?))2/3 
and 6*2 is a constant that only depends on ci and Ci. 

Proof. Now on the event {A < ■d}, we have |P(C/o < “ F^fWo < t)| < 7r('d) for all t G M, so 

P(lLo < cuoia + 7r('i?))) > P(f7o < cuo{a + 7r(i?))) - > a, 

implying the first claim. The second claim follows similarly. □ 

We need the following auxiliary lemma, which shows the basic approximation properties 
for multiplier bootstrap method. 

Lemma B.15. For T,Tq,W,Wo defined in (B.44), (3.15), and (B.47), under the same as¬ 


sumptions of Theorem 4.11, we have 

IT-Tol 4 0, iW-Wol 4 0. 

Proof. See Appendix C.ll for a detailed proof. □ 

Lemma B.16. Under the same assumptions of Theorem 4.11, there exist Ci ^ 0 C 2 > 0, 
depending on n and typically Ci —^ 0, C 2 —>• 0 as n —)• 00 such that 

p(|T-ro| >Ci) <C 2 , (B.50) 

P(Pe(VF - Wo) > Cl) > C2) < C2. (B.51) 

Proof. See Appendix C.12 for a detailed proof. □ 


Lemma B.17. Suppose that condition (B.51) holds. Then for every a G (0, 1), 

F{cwo ^ cwoia + C2) + Cl) > 1 - C2, 

^(cwo < cw{oi + C2) + Cl) > 1 - C2- 

Proof. By (B.51), the probability of the event Fe(W — Wo) > Ci) < C2 is at least 1 — C2- On 
this event, 

Pe(Wo < cwo{oi + C2) + Cl) > Fe{Uo < C[/o(« -b 7 r(i?))) - 7 r(i?) > a, 
which implies the hrst claim. The second claim follows similarly. □ 
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Proof of Theorem 4-M- Recall that 


7r{^) = V log{\E^\/^)f/^. 

In addition, let ki(??) = cz^ia — (2 — 7r('*?)) and K 2 ('d) = czgia + C 2 + '^(^))- Then by 
Lemmas B.12, B.14 and B.17, we have 

^{{T < cw{a),To > cuoioi)} U {T > cw{(y),TQ < cuoia)}) 

< P(ki(^ 9) - 2Ci < To < K2{^) + 2Ci) + 2P(A > ??) + 3 C 2 

< P(ki(^?) - 2Ci <Uo< K2{^) + 2Ci) + 2 P(A > ^?) + 8(2 + ^CtT^ 

< 27r(i9) + 2 P(A >i9) + 2Cn-^ + C 3 C 1 V log(|T"|/Ci) + 5 C 2 . 


□ 


C Proof of Auxiliary Lemmas in Section B 

C.l Proof of Lemma B.l 

Lemma C.l. Suppose 0* G lA{s,K). Under Assumptions 4.1 and 4.2, we have 

^(b^Rb)”^b^vec(V4(0*)) ^ AI(0,1). 

where b = [1, —(w* 

Proof of Lemma B.l. By Lemma C.l, replacing 0* with (0,0^.^^^) completes the proof. □ 


C.2 Proof of Lemma B.2 

Proof. By definition, we have 

\\VHn{@) - HlUa. = 11©-^ ® 0-^ - 0*"^ ® 0*"'||max 

< ||0-^ ® (0-1 - 0*-^)|Uax + IK©-^ - 0*"^) ® ©*"1max 

< 11(0-1 - 0*-l) ® (0-1 - 0*-l)|Uax + 2||(0-1 - 0*-l) ® 0*-l|Uax . (C.l) 

(i) (ii) 

For term (i), we have (i) < ||0-i — 0*-i||^ . For term (ii), we have 

(ii) < 2||(0-1 - 0*-l)||max • ||S*||2 < 2u||(0-l - 0*-l)||max, 


where the second inequality follows from Assumption 4.2. Substituting terms (i) and (ii) 
into (C.l), we obtain 


v24(0)-h|| < 11(0-1 - 0*-1)|P +2u||(0-i 

^ Umax — 11^ ^llmax II'' 


0*-i)|| 


(C.2) 
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From (C.2), we can see that in order to bound ||V^£r).(0) — it is sufficient to get a 

bound on ||©~^ — Thus, in the following, we aim at bounding ||0~^ — 0*~^||max- 

Consider the Taylor expansion of 0' ^ at 0* as follows: 

OO 

(0* + A)“^ = (I + (0*)-^A)”^0*-^ = ^(-1)^(0*-1A)^0*-\ (C.3) 

k=0 

The right hand side of (C.3) can be further rewritten as 

OO OO 

^(-1)*^(0*-^A)^0*-^ = 0*-^ - 0*-^A ^(-1)^(0*-^A)^0*-^ = 0*-^ - R(A), 

k=0 k=0 

(C.4) 

where R(A) = 0*“^AJ0*“^ and J = ^^q(— 1)^(0*“^A)^. Thus, combining (C.3) 
and (C.4), we obtain 

0-1 _ 0*-i = R(A). (C.5) 

From (C.5), in order to bound ||0~^ — 0*“^|| , it is equivalent to bound ||R(A)|| . We 

have 


R(A) 


= max e7 0*-iAJ0*-^ej < 


0 


*-l| 


< Il0 


*-ii 


0 


OO 

*-lll 


AJ 




0 


*-il 


(C.6) 


where the second inequality follows from ||AB||max < || A|| ma x • ||B||oo, and the third inequality 
follows from Assumption 4.2. Recall that J = ^^g(—1)*^(0*“^A)^, we have 


OO 

u<E 

k=0 




OO 

I 

loo — / ^ I 
k=0 


|0*-^A| 


< 


1 - 0*-i 


< 


1-As* A 


where the hrst inequality follows from triangle inequality, the second inequality follows from 
submultiplicativity of matrix || • ||oo norm, and the last inequality follows from Assumption 4.2. 
Since ||A||oo = ||A||i = 0{s^\og d/n), when n is sufficiently large, we have As*||A||oo < 1/2, 
and therefore 

l|J||oo < ^ _ -yj2 — (CT) 

Substituting (C.7) into (C.6) and (C.5), we obtain 

||0"^ - 0*-7|max = ||R(A)|Uax < 2A|* || A||„,ax = (C.8) 


38 








where the last equality follows v = 0(1). Finally, substituting (C. 8 ) into (C.2), we get 


v24(0) - H 



This completes the proof. 


□ 


C.3 Proof of Lemma B.3 

Proof. In this proof, we use w as the shorthand for w(©). By the dehnition of w and w* 
in (3.7), we have 


w — w 


|i — 


< 


n 


{jkp,{jk) 




+ 


n 


{jkp,{jk) 


Uk),ijk) 


ft 


Uk),ijk) 



(0 (S' ® ) (j/c)'^,(jfc) 

( 0 * ® 0*)Qfc)c,(7fc) 

1 

(0 (S ®)(j/c),(jfc) 

( 0 * ® ®*){jk),{jk) 


(© (8> ^')(jkp,{jk) (© ® © ') {jkY,{jk) 


(0 ® ®'){jk),(jk) (© ® ®'^(jk),{jk) 

“V'' 

(i) 


+ 


(0 (g) 0 ){jkY,(jk) 

(0 (g) 0 ){jkY,{jk) 

(0 (g) G)(jk),{jk) 

( 0 * (g) ®*){jk),{jk) 1 


(ii) 


(C.9) 


In the sequel, we bound terms (i) and (ii) respectively. 

Bounding term (i): We have 


(i) = 


^© 0 Q )(jfc),(jfc)^ 


|(© (g) @)(jkY,{jk) - (©* ® ®*)(ife)g(iA:)|ll 


(i).b 


(C.IO) 


To bound term (i).a, we have 

(0 (g) ®){jk),{jk) > ( 0 * ® ®*){jk),{jk) - |(0 ® 0 )(ifc),(ifc) - ( 0 * ® ®*){jk),{jk)\ 

> ^ - |(© ® e)ijk),ijk) - (0* ® ®*)ijk),(jk)\^ (C-ii) 

where the second inequality uses the fact that (©* (g) ®*){jk),(jk) ^ Amin(©* ® ©*) > 
Amin(0*)^ > ^ under Assumption 4.2. It remains to a upper bound | (0 (g) 0 )^ 7 .) — (0* (g) 

®*')ijk),{jk)\ in (C.ll). We have 

|(0 (g) ®){jk),{jk) - ( 0 * ® ®*){jk),{jk) \ = \®jj®kk - ®*jj®lk\ 

< ISj.Gkk - ®*jj®kk\ + \Q*jjQkk - Q*jjQlk\- 

(C.12) 
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The right hand side of (C.12) can be bounded by 


®jj®kk - ®*jj®kk \ + \ ®*jj®kk 


0T0 


* I 
kk\ 


< |0,y - 0T| • |0fcfc| + |0T| • \@kk - 0L| 

< ||0 - 0*112(11© - 0*||2 + +i^|| 0 - 0*||2 



(C.13) 


where the second inequality uses |0*^| < Amax(0*) < i' and |0fcfc| < |0fcfe - 0^^! + |0fcfc| < 
II® “ ®*ll2 equality follows from ||0 — 0*||2 = 0(sy^log d/n) and u = 0(1). 

Substituting (C.13) into (C.12), we obtain 

|(0®0) iMxm - (®' ® (C,i4) 


Submitting (C.14) into (C.ll), we can obtain that 





> 


2 iy^' 


when n is sufficiently large. Therefore, we have (i).a = 1/(0 ®0) 

U^) (i^) “ 

going to bound term (i).b. We have 


(i).b < ||((e - e-) ® (0 - e-)),,,,..0^,11, + ll ((0 - ©•)® ®*)u.)m*.,IIi 
+ Il((e'®(0-e'))o»,.,„nlli 

= IK© - ©•)„ ® (© - ©•).t|K + IK© - ©•).! ® ©:t|li + II©:, ® (© - ©•).t|li • 

'-V-' '-V-" '-V-" 

h h I3 


For Ji, we have 

h = 11(0 - 0*)*,||i • 11(0 - &*U\\^ < ||0 - &*\\l = Op{s‘^logd/n). 


For I 2 , we have 

h < IK© - 0*)*iiK • ||0:fc||i < 11© - ©ii • M = Op(sVlogd/n), 

where the second inequality follows from the fact that 0* G U{s,M). Similarly, we have 
I 3 = Op(s-y/log d/n). Combining terms /i ,/2 and I 3 , we obtain that (i).b = Op (s^ log d/n + 
sy^log d/n). Combining terms (i).a and (i).b and substituting it into (C.IO), we obtain that 


(i) = Op 




(C.15) 
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Bounding term (ii): We have 


(ii) = 


(0 (g) 0)(jfc)_Q-fc)(0* (g) &*)(jk),{jk) 


|(0 (g) - (0* (g) ®*){jk),{jk)\ 


{ii).a 

{&* ®&*)^jkr^{jk)\\^- 


(ii).b 


(C.16) 


(ii).c 


For term (ii).a, by similar technique used for bounding (i).a, when n is sufficiently large, we 
have 


1 


< 


(C.17) 


(0 (g) &)^jk)^(jk){&* ® ®*)ijk),{jk) 

For term (ii).b, we have already bounded the same term in (C.14), i.e., 

|(®‘g)®)(ifc),(ifc)-(0*(g)0*)(jfc),(jfc)| = {C.18) 

For term (ii).c, we have 

(ii).c = 11(0* ® e*)ukr,uk)\\, = 11©:,-111 • ll©:fclli < (C.19) 

since 0* G U{s, M). Substituting (C.17), (C.18) and (C.19) into (C.16), we obtain that 




n 


(C.20) 


since v = 0(1). Combining the above bounds in (C.9), (C.15) and (C.20), we obtain that 


* ^ 
|w — w S 


2 logd logd 

s -h s 


n 


n 


□ 


C.4 Proof of Lemma B.5 

Proof. Denote R[jk),{j'k') by 

1 

^ ^ i=i 




1 ’^j'k'), 
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where = hj^i^jk) ■ hj"^,{'Sj^k') is calculated as 

^ ^ sign X^'j^ i^Xik ^i'fc)] “1“ arcsin (Yljk^ ^ 

• I - ^ sign [{Xij> - Xiny) [Xik' - XiMk')] + arcsin (Sj/fc/) |. 

Recall that R = R(S“^). We have 




+||R(I]*-^)-Rll 

11 max M ^ ' II max 

-^ ^ 


(i) 


(ii) 


In the following, we bound terms (i) and (ii) respectively. 

Bounding Term (i): Let . hjj'’*" (S^-^, S^v^,). 


Since 




'jk 


dS 


'jk 


1 - 




+ { - ^ sign [{Xij - Xi^j) [Xik - Xi,k)] + arcsin (Sj/fc/) | 

By Assumption 4.2 and diag(5]*) = I, we have Tj*^ < C < 1 for j ^ k, we have for any 
T,jk = + t{T,jk - with t G [0,1] that 


^^{jk),{j'k')^'^j^' '^j'k') 


dS 


jk 


< - TT + TT, 


^jk—^jk 




(C.21) 


where C is a constant. Therefore, by mean value theorem, we have 

\^ijk),{j'k'){'^ ) ~ ^(jk),{j'k')i^ )| 

^^(jk),{j'k')^'^j>^' '^j'k') 


n(n — 1)2 

'' '' i=l lYi.i'Y* 


+ 


nin — 1)2 


dT,jk 

^^{jk),(j'k')^j^'’ '^j'k') 


^jk—^jk 


^jk - ^k 


< I - TT 


^ ) • (I ^jk ~ ^^k I + I ^j'k' ~ 51 


^jk —^j/e 


Sj/fc/ — S*/^/ \ 


j'k’ I ) : 


(C.22) 


where Jjjk = ^*jk + ^(^jk - ^*jk) for some t G [0,1], and the last inequality follows from (C.21). 
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Using (C.22), by union bound, we have 


(i) = ||R(E-‘) -R(E-‘)||„„ <2. +,) ■ ||S - E||„„ = Orf/^Y 


v'l^ 


n 


(C.23) 


Bounding Term (ii): By the definition of R and we have 


jk 




Uk),{fk') 


n 


2 = 1 


1 

+ E E ■ E K’? ■ Y'Yl 

^ 2=1 2^^ 

-72 - 72 

^ > i=i ^ > i=i i'^i 


where the last step follows from the fact that and are independent given Xj. Recall 
that h^{]'k([fk') = 

[1?(S ')]ijk),(j'k') ^{jk),(j'k') 


n{n — 


4^ E 


n-2 1 

+ 


n — 1 n — 1 1 nin — 1) ^ 

i'T^i 


h 

E^‘-f>'4YY-YY-E[/>S'‘-Y;S 


(C.24) 


For term R, it can be seen that R is zero mean third order U-statistic. Notice that 

Hoeffding’s inequality for U-statistics (Hoeffding, 1963), we 

have 


IP(|2i| >t)< 2exp ( - ^ 


n 


(C.25) 


For term I 2 , it can be seen that I 2 is a zero mean second-order U-statistic. Moreover, we 
have \FjkFjik' • < ^vr^. Thus, by Hoeffding’s inequality for U-statistics (Hoeffding, 

1963), we have 


P(|/ 2 | > t) < 2exp - 


/ Ct^ 

n 

V Idvr'^ 

L 2 J 


(C.26) 
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Substituting (C.25) and (C.26) into (C.24), and invoking union bound, we have 


n — 1 ^ 
n 


/ Ct‘^ 

n 

V 47r^ 

-3- 


+ 


1 


-2(f exp ( — 


Cf n 

1^12 


which immediately yields 


fii) = -Rll 


= Op {y/log d/n). (C.27) 

Finally, we combine (C.23) and (C.27) and complete the proof. □ 

C.5 Proof of Lemma B.6 

Proof. Recall that we have 5^(0) = ej0^S0efc ~ Qjk/i&jjQkk), so we can obtain 

i^jj^kk + ®‘jk] 




{&jj&kk) 

0jfc(^fc*0*fc + — l) 0jfc [0jj0fcfc + ®jfc] 


Qjj&kk (0ji0fcfc)" 


= |0ifc| • 


([S0]fcfc + [510] jj — 2^QjjQkk — 0 


jk 


{QjjQkk) 

The right hand side of (C.28) can be further bounded by 

([^0]fcA; + [^0]ii ~ 2) 


(C.28) 


|0jfe| 


Qjj^kk 


+ |0jfc| 


02 


{Ojj&kk) 


(i) 

In particular, we have 

[S0]fcfe = [(^ ~ S)0]fcfc + 1 < ||S — S| 


(ii) 


|0||l + 1 < 


logd 


M + 1 


n 


(C.29) 


Substituting (C.29) into (i), we obtain (\) < K ■ n ^y^log d/nM/^u"^). For term (ii), we have 
(ii) < ■ n~^^/{u^). Thus, when r/ > 1/2, we have 


>S'n (0jfc) 0(jfc)'=) ‘S'n (O) 0(jfc)'^) 


^jk [Qjj®kk + ©jfc] 


<C-K-n-^ 


logd 


n 


(^QjjQkk) 

Since the above argument is for any 0 G Ui{K,r], s*,M*), this completes the proof. 


□ 
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C.6 Proof of Lemma B.7 

Proof. The proof is similar to Lemma B.5. We define several events as follows: 

fi = {ll§-0"‘IL„< Civ'll}. 

We have 


-R| 


= ||R(S"^) -R(S 




(i) 


. + ||R(5]-^) -R| 

' ' - - 

(ii) 


In the following, we will bound terms (i) and (ii) respectively. 

Bounding Term (i): By the similar argument as in the proof of Lemma B.5, we have 

(i) = ||R(S-1) - R(S-1)||^^^ < 2 • (+ tt) • ||S - SlUax. (C.30) 

Under event £i, we obtain that 


R(S-i) - R(51-')L,„ < C • 


(C.31) 


Bounding Term (ii): By the similar proof of Lemma B.5, we have 


E 


nin — l){n — 2) 

\ /V / 






n-2 1 

+ 


n — 1 n — 1 1 n{n — 1) ^ 

i'^i 


h 

E 7-1 j-p liili liili TTrrL^^K 

FjkFfk' V -^j'k'-Hhjk -hfl 


jk ‘'j'k'i 


(C.32) 


h 


For term /i, it can be seen that Ii is zero mean third order U-statistic. Notice that 

® ^ j M*), by Hoeffding’s inequality for U- 

statistics (Hoeffding, 1963), we have 


(C.33) 


for all 0 G Ui{K,r], s*,M*). For term I 2 , it can be seen that I 2 is a zero mean second-order 
U-statistic. Moreover, we have \FjkFj'ki ■ < 47r^ for all 0 G Ui{K, rj, s*, M*). Thus, 

by Hoeffding’s inequality for U-statistics (Hoeffding, 1963), we have 


/ Ct^ 

n 

V 47r2 

-3- 


P(|/ 2 | > t) < 2exp ( — 


/ Ct^ 

n 

V 167r^ 

L 2 J 


(C.34) 
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for all 0 G Ui{K,r], s*, M*) Substituting (C.33) and (C.34) into (C.32), and invoking union 
bonnd, we have 


R(5] - R||max >t) < 


re — 1 


re 


/ Ct‘^ 

re 

V 

-3- 


+ 


re — 1 


/ Ct^ 

re 

V 167r'^ 

L 2 J 


for all 0 G Ui{K,r], s*,M*). It immediately yields that, 

||R(I]-i)-R||^^<C'Vlogd/n. 
Finally, we combine (C.30) and (C.35), yielding 


(C.35) 


inf P© R-R <C' 


logd\ 


> 


inf 


re J ®eUiiK,rj,s*,M*) 


1 - 


This completes the proof. 


□ 


C.7 Proof of Lemma B.8 

Proof. The proof is identical to the proof of Lemma C.l, followed by applying Berry-Essen 
Theorem (Van der Vaart, 1998). □ 

C.8 Proof of Lemma B.9 

Proof. It is similar to the proof of Lemma B.3. Hence we omit it. □ 


C.9 Proof of Lemma B.IO 

Proof. First, we prove (B.16). By the similar proof of Lemma B.2, we obtain 

||0 ^ — 0 ^llmax = ||R(^)||max ^ || A||max- 


By Assnmption 4.9 and the fact that re = 0(1), we have 


lim inf P© 



- 0 


-ll 



= 1 , 


(C.36) 


where Os is a constant. 

Now we prove (B.17). In particular, we have || — S -|- 0~^|| < || — H -|- S|| -|- 

^ \ ^ ’ II Umax — II Umax 

0“^ -|- 0~^||jaax' Theorem 4.2 in (Liu et ah, 2012) and (C.36), we have 


lim inf P© 

n-^oo &&Ui{K,ri,s*,M*) 


S + 0-1 


max 



= 1. 


This completes the proof. 


□ 
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C.IO Proof of Lemma B.ll 


Proof. The proof of this lemma is analogous to the proof of Theorem 4.7, but studies the 
local alternative hypothesis, i.e., Ui[K,rj, s*,M*). In particular, we define several events: 

To = {||0 - 0|L,, < C'ov 1^^},T6 = III©-' - £|Uax < 

Ts = I ||w(0) - w||^ < }• 

We first prove (B.18). By Taylor expansion, we obtain 

5n(0, ©(jfc)O = Sn{@) - K ■ + (w - w)’^vec(-SQfc)c + [0-']Qfc)c) 

'-V-' 

(i) 

-[©-'life + [0"^]jfc - w~’’vec([©-'](jfc)c - [0-'](jfe)c) +S'n(O,0(jfe)c) - Sn{&) + K ■ . 

'-V-' '-V-' 

(ii) (iii) 

(C.37) 

In the following, we are going to bound terms (i), (ii) and (iii) respectively. 

Bounding term (i): We have that 

(i) < ||w-w||^- ||vec(-£(jfc)c + [0-'](^-fc)c)||^ < ||w-w||^- II - S + 0-'||^^, (C.38) 


where the first inequality follows from Holder’s inequality, and the second inequality follows 
from ||vec(A)||oo = ||A||max- Given event T 3 and To, we have 


(w-w)’^vec(-S(jfc)c + [0 '](jfc)c) 


<^ 3^6 




3/2 


+ S 


logd 


n 


(C.39) 


Bounding term (ii): By the similar proof of Theorem 4.7, we obtain 

(ii) = - (w, -H(jfc)c_Q-fc)cvec(A(jfc)c)) + [Ri(A)]jfc + (wr, vec([Ri(A)]Q-fc)c)) . 


(ii).a 


(ii).b 


(ii).c 


(C.40) 


By the dehnition of w = have (ii).a = 0. It remains to bound 

terms (ii).b and (ii).c. Recall that in Theorem 4.7, we have proved that ||Ri(A)||^^ = 
Op(slogd/n). Then given event To, we have 

|[Ri(^)](ifc)-| < ||R-i('^)|Lax - Goslogd/n. 
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Similarly, we can show that 


(ii).c < ||w||i • ||[Ri(A)](jfc)c||^ < ||w||i • ||Ri(A)||max- 
Thus, given event £o, we have 

(w, vec ([Ri(A)](,fc)c)>| <C"slogd/n, 

since ||w||i < Combining terms (ii).a, (ii).b and (ii).c, under event £q, we can show 

that 

- [®~^]jk + - w’^vec([0"^](jfc)c - [©■^](jfc)c) < C'"logfi/n, (C.41) 

Bounding term (in): By Lemma B.6, we have 

Sn{0, @(jkr) - Sn{&) + K • n-V <Co-K- (C.42) 


Let r]{n) = ^JnjK Cj,C%s^{^o'gdlnf‘l‘^ + Cs\ogd/n + Cq ■ K ■ n ’^y^log d/n , and substi¬ 
tute (C.39) and (C.41) and (C.42) into (C.37), together with Lemma B.9, we obtain 


inf P© 

&mi{K,ri,s* ,M*) 


> inf 

&eUi{K,ri,s* ,M*) 


s (0, ©(jfc)c) - S (o, Q{jk)<^) < vin) 
P©(To n T3 n £q) 


>1- sup P©(£’o)- sup P©(T 3 )- 

&£Ui{K,'q,s*,M*) &£Ui{K,'q,s*,M*) 


sup ^©(Te) 1, 


as n —>■ 00 . 

Now we are going to prove (B.19). Let £ = £0 (1 £3 (1 £q. Note that 

F&{^/nS{ 0 ,&^jk)o)/{ 2 a) <t) <F&{y/nS{ 0 ,@(^jk)c)/{ 2 a) <t,£) +P©(T'') 

< P© (^/^5(0, 0(,fc)c)/(2a) < t + + r?(n)) + P©(T'^). 

Therefore, we have 


P©(VnS(O,0Qfc)c)/(2(T) < t) -$(t) 

< P© (^v/^S(0, 0(,fc)c)/(2cT) < t + + 7?(n)^ - 4>(t) + P©(T^) 

( ~2 \ ~2 
^/nS{0,&{jk)<=)/ (2cj) < t + + r?(n)J - 4>(f + + r]{n)) 

~2 

+ ^ ^ Pe(T"). 
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By Lemma B .8 and the fact that (i + if /{2a) + rj{n)) — ^{t) < {Kn^l‘^ ^it^/(2cj) + 
ry(n)) and r/ > 1 / 2 , we obtain that 

limsup sup sup ("p©(-v/n*S'(0, 0(jfc)c)/(2(T) < t)—$(!)')< 0. (C.43) 

rn-oo ©eWi(X,»?,s*,iW*) teR ^ ' 

Following a similar argument, we can also show that 

‘sis'ea.^ 0 " m) > 0. (0,44) 

Combining (C.43) and (C.44), and using the fact that lim„_^oo sup 0 gj^^(j^,^ P©(<S‘^) = 0 

completes the proof of (B.19). Note that (B.20) can be proved by the same argument, so 
we omit it. Finally, we are going to prove (B.21). Given event £, it is easy to seen that 
\y/nSniO, 0 (jfc)c)/ 2 o-| < t implies that pmin < y/nSn{0, GQk)o)/2a < pmax, where 


~2 ~2 

Pmin = -t- r]{n) + p^ax = i + p{n) + 

Za Z(j 


We can show that 


F&{\VnSn{0,®(jk)^)/2a\ <t) < P©(pmin < VnSn{0,&(jk)^)/2a < Pmax) +p©(f'') 

^ IP© (pmin ^ ^ Pmax) IP(Pmin ^ ^ Pmax) 

+ P(Pmin £1 Z < Pmax) ~ 

where Z ~ N(0,1). By Lemma B.8, we have 

limsup sup |P©(pmin ^ \/ii'S'n(0, G(^jk'jc)/2a ^ Pmax) P(Pmin ^ Z < Pmax)| — 0. 
n —>-00 ©gWi 

In addition, when p < 1/2, if if < 0, we have P(pmin < Z < pmax) < 'h(pmax) —t 0 uniformly 

over 0 G iii(if, p, s*, M*) as n — )• oo. Furthermore, we have that limsup„_^(^ ^^'9&&Ai{K,eta,s* ,M*) IP©(‘^‘^) 

0. The same arguments apply to the case that if > 0, since P(pmin < Z < pmax) < 

1 — 4>(pmin) 0 uniformly over 0 G iii(if, p, s*,M*) as n —)> oo. This completes the proof 
of (B.21) □ 

C.ll Proof of Lemma B.15 

In order to prove Lemma B.15, we need the following auxiliary lemmas. 

Lemma C.2. Let = Hl/nXlILi © G*) • = maxi<j-^<^11/re XlILi ^jk ' ■ ei\. 

We have An = 0]f>{logd/^/n). 

Lemma C.3. Let Bn = |b^l/-v/re^(L;^ vec(F © (G* — G*)) ■ ei\. We have Bn = op(l). 

Proof of Lemma B.15. We prove this theorem by three steps. First, we will obtain a bound 
on |T — Tol, then we will bound |1T — lLo|. 
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Bounds for \T — To\: By triangle inequality, we have 


T - Tol < IT - Til + \Ti - Tsl + jTs - To\ . 

(i) (ii) (iii) 


For term (i), we have 


r-Ti 


< 

< 


max 

{j,k)£[d]x[d\ 


n( &jk - 0 


jk 


H, 


i jk)\{jk)<= 

2 a 


— max 
0',fc)e[rf]x[d] 


^ ( ^jk 0 


jk 


®jk ®jk^ ' (^^{jk)\{jk)‘^ / ^{jk)\{jkY / 


max y/n ■ 

(j,fc)e[ci]x[d] 

. . r . 1 “ ®jk \ ■ \^ijk)\ijkY/{‘^^) ~ -^(ifc)|Qfc)=/(2<7)|. 

(j,k)e[d\x[d\ 


^ijk}\UkY 

2 d 


(C.45) 


Since we have \Qjk - 0*^1 < ||0 - 0*||max = Op(si/log d/n), and \\H(^jk)\(jkY/{‘^(^) - 
//(j7j)l(jfc)c/(2CT)|| = op(l), we obtain \T — Ti\ = op(l). For term (ii), we have 


|Ti -Tal 


max 

(j,fc)e[d]x[(i] 



^{jk)\{jkY/ (^O") 


max y/nSn{®*) / { 2 a) 
(j,fc)e[(i]x[(i] 


< max y/n ■ 
{j,k)£['l\x[d] 


(e,k - e*k) • i^0fc)|0fc)c/(2a) - 5„(0*)/(2a) 


op(l), (C.46) 


where the last equality follows from Theorem 4.11. For term (iii), we have 


\T2 - To 


max y/nSn{&*)/{2a) 

{j,k)mx[d] 


max 

(jk)£[d]x[d] 



n 



2 = 1 


< max 
{j,k)&ld]x[d] 


1 '' 

-j. ^ l/(2a) • b^vec^T © (G* 
^ 2 = 1 



(C.47) 


In the proof of Lemma C.l, we have shown that \T 2 — To| = op(l). Furthermore, these bounds 
are independent of (j, k), which means that after taking maximum over all (j, k) G [d] x [d], 
the same bounds still hold. Therefore, we have \To — Ti\ A 0. Combining terms (i), (ii) and 
(iii), we obtain |r — Toj A 0. 

Bounds for | W — Wq \: We have 


I W — Wq I < max 

{jk)&[d]x[d] 


^ ^ ft ^ IL 

_A 7 

y/n a y/n 

i=l 


i=l 


+ max 
{jk)£[d]x[d] 


\a — a\ 


a 


(i) 


n 

n 

2 = 1 

•v" 


(ii) 
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By definition, for each (j, k) G [d] x [d], we have 


(0<^ 


1 

+ - 
a 


^ 1 ” ^ 1 1 ” -_ 
(b-b)^-Vvec(F0G*)ei +- b^—V vec(F © (G* - G*)) 
n a Jn ^ 

i=l ^ i=l 




(i).a 


(i).b 


1 ” - 
b^ — y vec((F - T) 0 G*)ei 

1=1 


(i).c 

In the following, we are going to bound terms (i).a, (i).b and (i).c respectively. For term (i).a, 
we have that 


(i).a < ||b — b*||j^ 


= ||w — w*||^ • O] 


y vec(F©Gy Ci 

{%)■ 


< b-b* 


1 ” 

-EC'sc:*) 


e,; 


i=l 


(C.48) 


where the last equality follows from b — b* = w — w* and Lemma C.2. According to the 
proof of Lemma B.3, we have 


I- *11 ^ / 2logd , /logd\ 

Iw-w i|, = oq, —+ Aurj 


(C.49) 


Substituting (C.49) into (C.48), we obtain 

(i).a = Op(^s +s- - - j=op(n / ). (C.50) 

Since the bound in (C.50) does not depend on specific (j, k) G [d] x [d], taking maximum over 
all {j,k) G [d] X [d], we have 


max 

(j,fe)e[d]x[(i] 


(b — b)^—y vec(F 0 G*) • Cj = op(n 


i=l 


For term (i).b, using Lemma C.3, we have 


bT_^vec{F0(&-G‘)) 


2 = 1 


= op(l)- 


(C.51) 


We see that the bound of (i).b is uniform for all (i,j) G [d] x [d]. For term (i).c, we have 


/1 " — 

(i),c<||bB||i-VS||(F-TWBL„- (-E« 


/BxB 


(C.52) 


51 




































where B = supp(b). In the proof of Lemma B.l, we have shown ||(l/nX]r=i = 

Ov^^/logs/n) and ||(T - F)Bxg||jjjax = Op(\/logs/n). Therefore, we have (i).c = Op((l + 
M'^) log d/n) = op(l). For term (h), it is easy to show that (ii) = op(l). Combining the 
results above, we get |IT — ILo| A 0. This completes the proof. □ 

C.12 Proof of Lemma B.16 

Proof. By Lemma B.15, we know that there exist ^i, ^2 depending on n and —?• 0,^2 0 

as n —>■ 00 such that 


P(|T - Tol > 6 ) < 6 , ^{\W - ITol > a) < 6 - 

Without loss of generality, we can further assume that a ^ 1 because otherwise we can simply 
consider n large enough to make this holds. Setting Ci = ?!> C2 = V^, we obtain 

P(|r - Tol > Cl) < Cl < C2, P(|fF - fFol > Cl) < cl- 

By Markov’s inequality, we have 

P(Pe(|fF - fFol > Cl) > C2) < - ^o| > Cl)] = ^n\W - Wol > Cl) < C2. 

C 2 C 2 

This completes the proof. □ 


D Proof of Auxiliary Lemmas in Appendix C 

Proof of Lemma C.l. Recall that T = [Tjk] G such that Tjk = rjfc(0*), and G = 

[Gjk] G such that Gjk = n{n-i) Z)i<i<i'<n In addition, we have G* = [G*^] such 

that = l/(n - 1) XliV* ^ifc(®*)- other words, we have Gjk = G'^jk- Since 

Vin{@*) = T © G, we have 


Y(b^Rb) ^/'bTvec(V4(0*)) 

- 1/2 1 


(b^Rb) ^^^^^b^vec(F0G*) + ^(b^Rb) ^/^b^vec(F 0 (G - G)) 

^ i=l '' -V-' 


(i) 


(ii) 


+ bTvec{{T - F) 0 G‘) . 

^ Z=1 


(D.l) 


(iii) 


In what follows, we will bound terms (i), (ii) and (iii) respectively. 

Bounding Term (i): Term (i) is a sum of n independent zero mean random variables. 
In addition, its variance is 1. We now verify the Lyapunov condition for term (i). By 
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Assumption 4.2 and ||b ||2 = -\/l + ||w* II? < yr+M w* ||i)2 < Vl + b’^Rb is 

lower bounded by (1 + s^i^‘^M^)Amin(R)- In addition, we have < 1, thus 

Fjk = Op(l). Therefore, 


n 


-3/2^e[ (b^Rb) ^/^^^b^vec(F0G*) = Op(l)n-3/2 ^E[|b^vec(F 0 G*)|^ 


i=l 


i=l 


i=\ 


(D.2) 


We have ||b||i = 1 + ||w*||i = 1 + Furthermore, we have 


^ sign [{Xij - Xi>j) [Xik - Aj/fc)] + | arcsin ([ 0 * ^]jk) I - ^ + | = 


Similarly, we can show that and \gjk{Xi,&*)\ < tt. Therefore, we have 

||vec(FoG0L ^ l|F|Lax- IIG*i 


max \\^ max 


< TT, by Holder’s inequality, we obtain 
|b^vec(F 0 G*)| < ||b||i • ||vec(F 0 G*)||^ < 7r(l + 
Substituting (D.3) into (D.2), we obtain 


(D.3) 


n 


3/2 (b^Rb) ^/^^^bTvec(F0G*) "J = Op(n- 3 / 27 r 3 (l + = op(l). 


2=1 


2 = 1 


Thus, by invoking the Lyapunov central limit theorem for term (i), we establish that 

1-1/2 1 


(b^Rb) ^/^^ Vb^vec(F0G*)^ A(0,1). 
\/n 


(D.4) 


2=1 


Bounding Term (ii): We define a matrix V**’ = [VjJ:] such that VJJ: = {&*) — 

(0*). Term (ii) can be rewritten as 


Trkw1/2. 


(ii) = - (b Rb) 

^ ^ 2 ^ ^ n(n- 1) 


^ E b’ 


l<2<2'<n 


vec(F0 V** ). 


In the following, we introduce as the shorthand of h**^(0*) and as the shorthand of 
We consider two cases. For the first case, suppose i ^ , i' < ^ we have 

+e[4I‘/.“;i'] +E[A‘ii‘'/.“;i'] +E[h^‘'i^if]. 

Since E[/z®®^] = E[/zj|'] = 0, and and /z^^' are independent, we can show that E[l/*^'l/vf/] = 
0 for the first case. For the second case, suppose one of i and i' is identical to one of i and T, 


(D.5) 
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e.g., i = £, then 



Note that we have 


(D.6) 




= E 


= E 


= E 


E[h%h%,\Xi] 

E[h%\x.]h‘‘:!i 


=e['*S'‘'*/ 2 ;] = 0 . 
= = 0 . 
= E[4l-fc-2] =0. 


(D.7) 


Substituting (D.7) into (D.6), we have ^[yjk^fk'] ~ ^ second case. The variance of 

term (ii) can be bounded by 

g(b^Rb)-- ‘ bTE[vec(F0V“>ec{F0V“')]b 

^g(bTRb)-‘ ^^J_ 53 b^E[vec{F0V"')vec(F0V“')]b. 

<71 


Recall that we have b''^Rb is lower bounded by (l+s^i^^M'^)Amin(R) and therefore (b'''Rb)“^ = 
0(1), We have \ VJJ:\ < Svr and \ VJJ^Vpl^,\ < Ovr^. Thus, the variance of term (ii) can be fnrther 
bounded by 


0(l)0p(l) 


n{n — 1)^ 


b^E[vec(V**')vec(V“')]b 


l<i<i' <71 


S ||Ehc{V"')vec{V«')] 

l<i<i'<7i 

= Op(7r^(l + z^^M^)^/(n - 1)) = op(l). 


By Chebyshev’s inequality and (D.8), we have (ii) = op(l). 
Bounding Term (hi): Let B = supp(b). We have 


(hi) < (b-^Rb)-V2||bs||, . ^ f;(T - F) 0 G*) 

^ i=i ^ 


/ BxB 


< (bTRb)-V2|n,^||^ . L_ . ||(T - F)bxb| 


n 


E® 


/BxB 


(D.8) 


(D.9) 


We have already showed that (b^Rb) ^ = 0(1), ||b||i = 1 + . 
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By mean value theorem, we have 


\G)k\ - 


n — I 


Vh%(&' 


< 


c 




-jfc -^k\: '^U,k) G [n] X [n], 


(D.IO) 


where the inequality follows from Assumption 4.2 and diag(5]*) = I, and C is a constant. 
By (D.IO) and Theorem 4.2 in (Liu et ah, 2012), we obtain ||(l/?^X]^=l ^*)sx-B||jnax ~ 
Op{^y^og~sJn). Similarly, by mean value theorem, we have 


Tjk — Fik\ = 


jk\ 


< 


cos[t arcsin (S*^) + (1 — t) arcsin(Sjfc)] — cos[arcsin(S*^)] 

C 


{t — 1) arcsin(S*^) + (1 — t) arcsin(Sjfc) 


< 


(D.ll) 


where the last inequality follows from Assumption 4.2 and diag(5]*) = I, and C is a constant. 
Thus, by (D.ll) and Theorem 4.2 in (Liu et ah, 2012), we obtain ||(T — F)gxg|| ma x “ 
Op(y^logs/n). Therefore, we have (hi) = ((1 + \ogs)/^/n = op(l). 

Combining terms (i), (ii) and (in), and invoking Slutsky’s theorem, we completes the 
proof. □ 


Proof of Lemma C.2. We have 


< lEx rnax — 
'■ ■' i<j,k<d n 






^ ^jk ■ G^jk • e* \/41og(i 


2 = 1 


< Ex max —, 

71 


^ i=i 


The right hand side of (D.12) can be further bounded by 


(D.12) 


ETA^I < max 


41ogd 


n 


1 


ExEfe'oy 


2 = 1 


which follows from maximal inequality for sub-Gaussian random variables. Since \Fjk-G),\< 
II F Umax • 11 G* 11 max < TT, we directly get the upper bound as E[A„] < 4 log d/nVnvr^ < 
47rlog(i/y^. We complete the proof by applying Markov’s inequality. □ 
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Proof of Lemma C. 3. We have 


IE [-Bn] < lEx 




E. 


1 " - - 

^J^bTvec(FO(Gi-G-))- 

2=1 


< Ex. 


\ i=l 


(D.13) 


The right hand side of (D.13) can be further bounded by 


Ex 


1 


n 


[bTvec(F0(G*-GO)] < 


2=1 




Ex<! - J^[bTvec(FO(G*-G0)]^ 
in.. 


2 = 1 


By the similar proof of Lemma C.l, we can show that [b~^vec(F 0 (G^ — 

G*))]^} = Op(l/n^). Therefore, we have E[Bn] = Op(l/n) = op(l). We complete the proof 
by applying Markov’s inequality. □ 
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